# Solved – Hessian matrix for maximum likelihood

Here's a question from my problem sheet.

For the normal linear model, verify that the MLEs \$boldsymbol{hat{beta}}\$ and \$tilde{sigma}^2\$ are maximal values for \$ell(beta, sigma^2;mathbf{y})\$ with respect to \$beta\$ and \$sigma\$, where \$ell\$ denotes the log likelihood. What is the maximum value of the likelihood \$L(beta, sigma^2,y)\$? That is: Compute \$max_{beta,sigma^2}L(boldsymbol{beta},sigma^2;mathbf{y})\$, where \$mathbf{y}\$ is the vector of observations.

I have tried to solve this question but I am confused at the solution, mostly the Hessian.

The Hessian \$H(boldsymbol{beta},sigma^2)\$ gives
begin{pmatrix}
dfrac{partial}{partial boldsymbol{beta}^T} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}} right] & dfrac{partial}{partial sigma^2} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}}right] \
dfrac{partial}{partial boldsymbol{beta}^T} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial sigma^2} right] & dfrac{partial}{partial boldsymbol{sigma}^2} left[ dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial sigma^2} right] \
end{pmatrix}

I have two questions:

1. How do I know when I need to use the transpose? e.g. why isn't the 1,1th element of the Hessian matrix just \$dfrac{partial}{partial boldsymbol{beta}}left[dfrac{partial ell(boldsymbol{beta},sigma^2;mathbf{y})}{partial boldsymbol{beta}}right]\$?

2. Why does the 2,1th element of the Hessian have to have the partial differential with respect to \$boldsymbol{beta}^{T}\$ on the outside, not just \$boldsymbol{beta}\$?

Contents

You have to remember that since \$pmb{beta} in Re^{n times 1}\$ is a vector, partial derivatives you described are vectors, and matrices. Especially the hessian

\$\$H(pmb{beta}, sigma^2) = begin{pmatrix} frac{partial}{partial boldsymbol{beta}^{T}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}] & frac{partial}{partial sigma^{2}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}]\ frac{partial}{partial boldsymbol{beta}^{T}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}] & frac{partial}{partial boldsymbol{sigma}^{2}}[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}] \ end{pmatrix} in Re^{(n+1) times (n+1)}\$\$

and since \$pmb{beta}\$ is a column vector, we have

\$\$ frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] in Re^{n times n} text{, is a matrix}\$\$

\$\$frac{partial}{partial sigma^{2}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] in Re^{n times 1} text{, is a column vector}\$\$

\$\$frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial sigma^{2}}Big] in Re^{1 times n} text{, is a row vector}\$\$

So we simply take partial derivates w.r.t \$pmb{beta}\$ or \$pmb{beta^T}\$ so that they "fit" in a matrix. To visualize it better

\$\$frac{partial}{partial boldsymbol{beta}^{T}}Big[frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta}}Big] = frac{partial}{partial boldsymbol{beta}^{T}} begin{pmatrix} frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}} \ vdots \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}} end{pmatrix} = begin{pmatrix} frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_1}}& frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_2}}& dots & frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_1}partial boldsymbol{beta_n}} \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_2}partial boldsymbol{beta_1}} & ddots & & vdots \ vdots& \ frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}partial boldsymbol{beta_1}} & dots & & frac{partial l(boldsymbol{beta},sigma^{2};mathbf{y})}{partial boldsymbol{beta_n}partial boldsymbol{beta_n}} end{pmatrix}\$\$

Rate this post