For reasons owing to mathematical convenience, when finding MLEs (maximum likelihood estimates), it is often the log-likelihood function—as opposed to the standard likelihood function—which is maximised.

From what I've gathered, this approach is deemed valid as a result of the monotonically increasing nature of the (natural) logarithm function.

My understanding of a monotonically increasing function is: for all $x$ and $y$ (defined on a subset of the reals), if $x leq y$ then $f(x) leq f(y)$.

This, however, does not appear to be the case for all log-likelihood functions; for example: for the log-likelihood $text{Gamma}(3, 5)$ function, if $x = 0.15$ and $y = 0.46$, then $f(x) = -0.59$ and $f(y) = -1.19$.

Clearly, I've misunderstood this concept. Fundamentally, I guess I'm asking can somebody (preferably mathematically) demonstrate why:

$$ hat{theta} = text{argmax} text{ } L(theta) = text{argmax} text{ } text{log} text{ } L(theta) $$

where $hat{theta}$ is the MLE for a given likelihood function.

**Contents**hide

#### Best Answer

This is a direct consequence of properties of monotone (increasing) transformation, and the logarithm is monotone increasing. If there exist a value of $theta$ that maximizes the likelihood function, that same value of $theta$ will maximize the log likelihood function. The later is often preferred because it has better numerical properties, so is easier to maximize in practice. That is not the only reason, the log likelihood function arises also much in theory.

**EDIT** The answer by @David Grenier: shows this by using calculus. But calculus is not necessary! Let $f$ be a monotonous increasing function, like $log$. This means that for all $x,y$, if $xle y$ then $f(x) le f(y)$. Let $hattheta$ be an MLE, so that for all $theta$, $L(theta)le L(hattheta)$ where $L$ is the likelihood function.

Applying $f$ to both sides we can conclude $f(L(theta))le f(L(hattheta))$. Now, if $f$ (like $log$) is strictly increasing, it has an inverse function, so the above argument can be reversed.

### Similar Posts:

- Solved – Why doesn’t multiplication by constant affect MLE?
- Solved – EM Algorithm seems to work, but Q is not monotonic. Possible reasons
- Solved – an example of a transformation on a posterior distribution such that the MAP estimate will be non-invariant
- Solved – Deriving the MAP estimate for Multinomial-Dirichlet
- Solved – Hypothesis test for composite null hypothesis of exponential parameter