I'm trying to proof that the Maximum Likelihood Estimator is Asymptotically Normal distribuited.
I'm stuck in the lasts steps. Here's what I've done:
I do the Taylor's expansion of, that's the mean of the score function:
$$frac{1}{n}sum frac{partial log f(x_i, theta)}{partial theta}$$
The Taylor's expansion around the true, unknown, value $theta_0$ is:
$$
left.frac{1}{n}sum frac{partial log f(x_i, theta)}{partial theta}rightvert_{theta_0}+ left.frac{1}{n}sum frac{partial^2 log f(underline{x}, theta)}{partial theta^2}rightvert_{theta_0}(theta-theta_0) +R/n
$$
We know that the mean is an approximatio of the expected value thanks to Weak Law of Large Numbers. The first one goes to 0 and second goes to a $-I_n(theta)$ and the third goes to 0 for assumptions on the form of the remainder.
Now my problem is that I was told to use the ML estimation $hattheta$ and do again the Taylor's expansion, but I didn't get all the steps
I know only that in the end we get this:
$$
(hat{theta}-theta_0)=left[frac{1}{sqrt{n}}sum frac{partial^2 log f(underline{x}, theta)}{partial theta^2}right]^{-1}left[frac{1}{sqrt{n}}sum frac{partial log f(x_i, theta)}{partial theta}+ R/nright]
$$
$ hattheta-theta_0 sim N(0,I^{-1}(theta_0)) $
I know that we have to use the Central Limit Theorem, but I'm quite confused and I don't know how to go on. I tried to get some information, but with no results.
Can someone provide me a clear explanation on why the MLE goes to Normal asymptotically? Thank you.
Best Answer
The log likelihood funciton is $$l(theta_0)=sum_{i=1}^n log(f(x_i)) tag{1}$$ Since $hat{theta}$ is a solution of the maximum of log likelihood funtion $l(theta_0)$ we know that $l'(hat{theta})=0$.
Next we do a Taylor expanson of $l'(hat{theta})$ around $theta_0$
$$l'(hat{theta})=l'(theta_0)+frac{l''(theta_0)}{1!}(hat{theta}-theta_0)+frac{l'''(theta)}{2!}(hat{theta}-theta_0)^2$$
Since $l'(hat{theta})=0$, we do some rearrangements here,
$$-l''(theta_0)(hat{theta}-theta_0)-frac{l'''(theta_0)}{2}(hat{theta}-theta_0)^2=l'(theta_0)$$ $$(hat{theta}-theta_0)=frac{l'(theta_0)}{-l''(theta_0)-frac{l'''(theta)}{2}(hat{theta}-theta_0)}$$
We multiply $sqrt{n}$ at both sides we get
$$sqrt{n}(hat{theta}-theta_0)=frac{frac{1}{sqrt{n}}l'(theta_0)}{-frac{1}{n}l''(theta_0)-frac{l'''(theta)}{2n}(hat{theta}-theta_0)} tag{2}$$
Next we need to show that $frac{1}{sqrt{n}} l'(theta_0)$ has a $N(0,I(theta_0))$ distribution.
From $(1)$ we get
$$l'(theta_0)=sum_{i=1}^nfrac{partial log(f(x_i))}{partial theta_0}$$
We multiply $frac{1}{sqrt{n}}$ at both side.
$$frac{1}{sqrt{n}}l'(theta_0)=frac{1}{sqrt{n}}sum_{i=1}^nfrac{partial log(f(x_i))}{partial theta_0} tag{3}$$
Now we use CLT for the right hand side of $(3)$
We treat $frac{partial log(f(x_i))}{partial theta_0}$ as a random variable here.
And we can show $E(frac{partial log(f(x_i))}{partial theta_0})=0$ by following procedures:
$$1=int_{-infty}^{infty}f(x)dx$$ take derivative of both sides
$$0=int_{-infty}^{infty}frac{partial f(x)}{partial theta_0}dx=int_{-infty}^{infty}frac{partial f(x)}{partial theta_0 f(x)}f(x)dx=int_{-infty}^{infty}frac{partial log(f(x))}{partial theta_0}f(x)dx$$
Which shows that $E(frac{partial log(f(x_i))}{partial theta_0})=0$
We can show the variance of $frac{partial log(f(x_i))}{partial theta_0}$ is $I(theta_0)$
Therefore, $$frac{1}{sqrt{n}}l'(theta_0)sim N(0,I(theta_0))$$
We also can show that $-frac{1}{n}l''(theta_0)=I(theta_0)$. I will not do detailed derivations here.
We also ignore the $-frac{l'''(theta)}{2}(hat{theta}-theta_0)$ part in $(2)$
Now we wrap up $(2)$
$$sqrt{n}(hat{theta}-theta_0) sim frac{N(0,I(theta_0))}{I(theta_0)}=N(0,frac{1}{I(theta_0)})$$
By some rearrangements, you can see $hat{theta}$ also normally distributed.