I am trying to plan out a study plan for learning MLE. In order to do this I am trying to figure out what is the minimum level of calculus that is necessary to understand MLE.

Is it sufficient to understand the basics of calculus (i.e. finding the minimum and maximum of functions) in order to understand MLE?

**Contents**hide

#### Best Answer

To expand on my comment – it depends. If you're only trying to comprehend the basics, being able to find extrema of functions gets you a fair way (though in many practical cases of MLE, the likelihood is maximized numerically, in which case you need some other skills as well as some basic calculus).

I'll leave aside the nice simple cases where you get explicit algebraic solutions. Even so, calculus is often very useful.

I'll assume independence throughout. Let's take the simplest possible case of 1-parameter optimization. First we'll look at a case where we can take derivatives and separate out a function of the parameter and a statistic.

Consider the $rm{Gamma}(alpha,1)$ density

$$ f_X(x;alpha) = frac{1}{Gamma(alpha)} x^{alpha-1} exp(-x); ,,, x>0;,,alpha>0 $$

Then for a sample of size $n$, the likelihood is

$$ mathcal{L}(alpha; mathbf{x}) = prod_{i=1}^n f_X(x_i;alpha) $$

and so the log-likelihood is $$ mathcal{l}(alpha; mathbf{x}) = sum_{i=1}^n ln{f_X(x_i;alpha)} \ = sum_{i=1}^n ln{left(frac{1}{Gamma(alpha)} x_i^{alpha-1} exp(-x_i)right)}\ $$ $$ = sum_{i=1}^n -ln{Gamma(alpha)}+(alpha-1)ln{x_i} -x_i\ $$ $$ = -nln{Gamma(alpha)}+(alpha-1)S_x -nbar{x} $$ where $S_x=sum_{i=1}^nln{x_i}$. Taking derivatives,

$$ frac{d}{dalpha}mathcal{l}(alpha; mathbf{x}) = frac{d}{dalpha} left(-nln{Gamma(alpha)}+(alpha-1)S_x -nbar{x}right)\ $$ $$ = -nfrac{Gamma'(alpha)}{{Gamma(alpha)}}+S_x\ $$ $$ = -npsi(alpha)+S_x $$

So if we set that to zero and try to solve for $hat{alpha}$, we can get this: $$ psi(hat{alpha})=ln{G(mathbf{x})}\ $$

where $psi(cdot)$ is the digamma function and $G(cdot)$ is the geometric mean. We must not forget that in general you can't just set the derivative to zero and be confident you will locate the argmax; you still have to show in some way that the solution is a maximum (in this case it is). More generally, you may get minima, or horizontal points of inflexion, and even if you have a local maximum, you may not have a global maximum (which I touch on near the end).

So our task is now to find the value of $hat{alpha}$ for which

$$ psi(hat{alpha})=g $$

where $g=ln{G(mathbf{x})}$.

This doesn't have a solution in terms of elementary functions, it must be calculated numerically; at least we were able to get a function of the parameter on one side and a function of the data on the other. There are various zero-finding algorithms that might be used if you don't have an explicit way of solving the equation (even if you are without derivatives, there's binary section, for example).

Often, it's not so nice as that. Consider the logistic density with unit scale: $$ f(x; mu) =frac{1}{4} operatorname{sech}^2!left(frac{x-mu}{2}right). $$ Neither the argmax of the likelihood nor of the log-likelihood function can be readily obtained algebraically – you have to use numerical optimization methods. In this case, the function is fairly well behaved and the Newton-Raphson method should usually suffice to locate the ML estimate of $mu$. If the derivative was unavailable or if Newton-Raphson doesn't converge, other numerical optimization methods may be needed, such as golden-section (this is not intended to be an overview of the best available methods, just mentioning some methods you are more likely to encounter at a basic level).

More generally, you may not even be able to do that much. Consider a Cauchy with median $theta$ and unit scale:

$$ f_X(x;theta) = frac{1}{pi (1 + (x-theta)^2)},. $$

In general the likelihood here doesn't have a unique local maximum, but several local maxima. If you find *a* local maximum, there may be another, bigger one elsewhere. (Sometimes people focus on identifying the local maximum closest to the median, or some-such.)

It is easy for beginners to assume that if they find a concave turning point that they have the argmax of the function, but besides multiple modes (already discussed), there may be maxima that are not associated with turning points at all. Taking derivatives and setting them to zero is not sufficient; consider estimating the parameter for a uniform on $(0,theta)$ for example.

In other cases, the parameter space may be discrete.

Sometimes finding the maximum may be quite involved.

And that's just a sampling of the issues with a single parameter. When you have multiple parameters, things get more involved again.

### Similar Posts:

- Solved – Maximum Likelihood Estimation of Dirichlet Mean
- Solved – Maximum Likelihood Estimation of Dirichlet Mean
- Solved – Pdf of $y = – log(X)$ when $X$ is beta distributed The expected value of $Y$
- Solved – MLE of $f(x;alpha,theta)=frac{e^{-x/theta}}{theta^{alpha}Gamma(alpha)}x^{alpha-1}$
- Solved – Maximum Likelihood estimator of population variance and its derivation process