Let $$f(y)=prod_i binom{n_i}{y_i}pi(x_i)^{y_i}(1-pi(x_i))^{n_i-y_i}$$
where $$pi(x_i)=frac{e^{sum_j x_{ij}B_j}}{1+e^{sum_j x_{ij}B_j}}$$
then the likelihood is
$$Lpropto prod_{i}pi(x_i)^{y_i}(1-pi(x_i))^{(n_i-y_i)}$$
$$l=sum_i y_i log(frac{pi(x_i)}{(1-pi(x_i))})+sum_i n_i log(1-pi(x_i))$$
I found that
$$frac{d^2l}{dBdB}=-sum_i n_ix_{ij}frac{e^{sum_j x_{ij}B_j}}{(1+e^{sum_j x_{ij}B_j})^2}=-sum_i n_ix_{ij}pi(x_i)(1-pi(x_i))$$
the Fisher information is
$$I(hat{B})=-E[frac{d^2l}{dBdB}]$$
and $$cov(hat{B})=I^{-1}(hat{B})=(X'Diag[(n_i{hat{pi}}(x_i)(1-{hat{pi}}(x_i))]X)^{-1}(*)$$
In the last line $X$ is a matrix.
anyone can help me understand how to get $(*)$?
EDIT: $B$ is the parameters of the model
Best Answer
(Since this seems like a homework problem, I am just pointing out the essential details.)
You made a mistake in calculating derivates, and also ignored some important details. Your log-likelihood is correct.
$$l = sum_i y_i sum_j x_{ij} B_j – sum_{i} n_i log left(1 + e^{sum_j x_{ij}B_j} right). $$
To calculate the $(l,k)$th entry of the covariance matrix we first take the derivate with respect to $l$ and then take a second derivative with respect to $k$.
$$dfrac{partial l}{partial B_l} = sum_{i} y_i x_{ij} – sum_{i}n_i dfrac{x_{il} e^{sum_{j}x_{ij} B_j}}{1 + e^{sum_{j}x_{ij} B_j}}. $$
$$dfrac{partial^2 l}{partial B_l partial B_k} = -sum_{i} n_i x_{il}x_{ik}dfrac{e^{sum_{j}x_{ij} B_j}}{left(1 + e^{sum_{j}x_{ij} B_j} right)^2} = -sum_{i} n_i x_{il}x_{ik} pi(x_i) (1- pi(x_i)). $$
(You had missed the second $x_{ik}$ term. Thus the $(l,k)$th entry of the information matrix is, $$I_{l,k}(B) = sum_{i} n_i x_{il}x_{ik} pi(x_i) (1- pi(x_i)). $$
Writing this in matrix form leads to the desired answer. Estimates for $pi(x_i)$ can be substituted in the covariance matrix to obtain an estimate of the covariance.
Similar Posts:
- Solved – Covariance matrix of parameters in logistic regression
- Solved – Covariance matrix of parameters in logistic regression
- Solved – Jacobian and covariance matrix
- Solved – Hessian matrix for maximum likelihood
- Solved – Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule