# Solved – Derivation of confidence and prediction intervals of predictions for probit and logit (and GLMs in general)

The derivation of the prediction interval for the linear model is quite simple: Obtaining a formula for prediction limits in a linear model .

How to derive the confidence and prediction intervals for the fitted values of the logit and probit regressions (and GLMs in general)?

Contents

In GLM, prediction is a non-linear function \$f\$ of the product of covariates \$X\$ with estimated coefficient vector \$hat{beta}\$: \$\$hat{y} = f(Xhat{beta})\$\$ Finite-sample distribution of \$hat{beta}\$ is generally unknown, but as long as \$hat{beta}\$ is a maximum likelihood estimate, it has asymptotic normal distribution \$mathcal{N}(beta, -H^{-1})\$, where \$H\$ is the Hessian matrix of the likelihood function in its maximum. The p-values of \$beta\$ that are shown as an output of a regression are nearly always based on this asymptotics. But if you feel your sample is too small for asymptotics, use numerical distribution (e.g. bootstrapping).

When you use asymptotic normal distribution of \$hat{beta}\$ (and therefore \$Xhat{beta}\$), distribution of \$hat{y}\$ is still non-normal due to non-linear \$f\$. You can ignore it – get normal confidence bounds \$(z_{lower}, z_{upper})\$ for \$Xbeta\$, and plug them into \$f\$, getting bounds for \$y\$ as \$(y_{lower}, y_{upper}) = (f(z_{lower}), f(z_{upper}))\$.

Another strategy (called delta method) is to take a Taylor expansion of \$f\$ around \$Xhat{beta}\$ – it will be linear in \$hat{beta}\$. Therefore, you can approximate distribution of \$f(Xhat{beta})\$ as \$\$f(Xhat{beta}) sim mathcal{N}left(f(Xbeta), -(f^{'}(Xbeta))^2 X H^{-1} X^T right)\$\$

Then the asymptotic 95% confidence interval for \$f(Xbeta)\$ would look like

\$\$ f(Xhat{beta}) pm 1.96 sqrt{(f^{'}(Xhat{beta}))^2 X H(hat{beta})^{-1} X^T}\$\$

Now you need only to find expression for Hessian matrices for particular models, like logistic regression in this question. And this question presents practical comparison of bootstrap, transformed normal bounds, and delta method for logistic regression.

Rate this post