Here's a question from my problem sheet.
For the normal linear model, verify that the MLEs $boldsymbol{hat{beta}}$ and $tilde{sigma}^2$ are maximal values for $ell(beta, sigma^2;mathbf{y})$ with respect to $beta$ and $sigma$, where $ell$ denotes the log likelihood. What is the maximum value of the likelihood $L(beta, sigma^2,y)$? That is: Compute $max_{beta,sigma^2}L(boldsymbol{beta},sigma^2;mathbf{y})$, where $mathbf{y}$ is the vector of observations.
I have tried to solve this question but I am confused at the solution, mostly the Hessian.
The Hessian $H(boldsymbol{beta},sigma^2)$ gives
begin{pmatrix}
dfrac{partial}{partial boldsymbol{beta}^T} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}} right] & dfrac{partial}{partial sigma^2} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}}right] \
dfrac{partial}{partial boldsymbol{beta}^T} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial sigma^2} right] & dfrac{partial}{partial boldsymbol{sigma}^2} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial sigma^2} right] \
end{pmatrix}
according to the answer.
I have two questions:
How do I know when I need to use the transpose? e.g. why isn't the 1,1th element of the Hessian matrix just $dfrac{partial}{partial boldsymbol{beta}}left[dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}}right]$?
Why does the 2,1th element of the Hessian have to have the partial differential with respect to $boldsymbol{beta}^{T}$ on the outside, not just $boldsymbol{beta}$?
Best Answer
You have to remember that since $pmb{beta} in Re^{n times 1}$ is a vector, partial derivatives you described are vectors, and matrices. Especially the hessian
$$H(pmb{beta}, sigma^2) = begin{pmatrix} frac{partial}{partial boldsymbol{beta}^{T}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}] & frac{partial}{partial sigma^{2}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}]\ frac{partial}{partial boldsymbol{beta}^{T}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}] & frac{partial}{partial boldsymbol{sigma}^{2}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}] \ end{pmatrix} in Re^{(n+1) times (n+1)}$$
and since $pmb{beta}$ is a column vector, we have
$$ frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] in Re^{n times n} text{, is a matrix}$$
$$frac{partial}{partial sigma^{2}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] in Re^{n times 1} text{, is a column vector}$$
$$frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}Big] in Re^{1 times n} text{, is a row vector}$$
So we simply take partial derivates w.r.t $pmb{beta}$ or $pmb{beta^T}$ so that they "fit" in a matrix. To visualize it better
$$frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] = frac{partial}{partial boldsymbol{beta}^{T}} begin{pmatrix} frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}} \ vdots \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}} end{pmatrix} = begin{pmatrix} frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_1}}& frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_2}}& dots & frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_n}} \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_2}partial boldsymbol{beta_1}} & ddots & & vdots \ vdots& \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}partial boldsymbol{beta_1}} & dots & & frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}partial boldsymbol{beta_n}} end{pmatrix}$$
Similar Posts:
- Solved – Computing the Hessian of maximum log likelihood function
- Solved – Canonical form representation of a Linear Gaussian CPD
- Solved – Variance of Random Matrix
- Solved – Deriving the Ridge Regression $boldsymbol{beta}mid mathbf{y}$ distribution
- Solved – Show that $hat{beta}_0 = bar{y}$ for OLS when the columns of $mathbf{X}$ are centered