I understand that, for example, maximizing the log-likelihood is equivalent to minimizing the negative log-likelihood. It is indeed a simple change, but still an extra step taken (it seems) for the unique purpose of designing a loss function that will be minimized instead of maximized.

I wonder ** why** this has become the standard in Machine Learning?

- Is there any numerical consideration that favors function minimization instead of maximization?
- Why has gradient descent become such a universal standard? (I have never seen a Deep Learning paper in which they use gradient ascent to directly maximize the likelihood)

**Disclaimer :**

I came across many similar questions, but none of which that have been truly answered. People typically just explain how both approaches are equivalent, or explain why we use the logarithm for numerical stability, but without explaining ** why** minimization is favored over maximization. (See those two questions : 1, 2)

**Contents**hide

#### Best Answer

It's my understanding that the only reason for this distinction is that in numerical analysis, it's the standard to talk about convex optimization rather than concave optimization, even though they are really the same procedures. For example, if you do a google scholar search for "concave optimization", you get about 300,000 hits, but "convex optimization" gets about 2,000,000.

Because convex optimization is talked about more in the numerical analysis literature, this nomenclature is followed in the machine learning community.

As you state, the differences are trivial, so the reason for the distinction is trivial.

### Similar Posts:

- Solved – Why study convex optimization for theoretical machine learning
- Solved – Is EM feasible when there is no closed form maximization of the expectation of log likelihood
- Solved – Motivation for gradient descent method over canonical method (for OLS/MLE) for simple linear regression
- Solved – Motivation for gradient descent method over canonical method (for OLS/MLE) for simple linear regression
- Solved – The relationship between expectation-maximization and majorization-minimization