Solved – Standard Error and confidence interval for multiple linear regression in matrix notation

Lets say we are trying to fit a normal linear model to our data as follows:

$y = beta _0 + x_1beta_1 + x_2beta_2 + … + x_pbeta_p + epsilon$.

My question is that how we can derive the standard error and confidence interval for each of the coefficient in matrix notation.

Under the usual Normal Gauss-Markov Model where $boldsymbol{Y}=boldsymbol{Xbeta}+boldsymbol{epsilon}$, it is assummed $boldsymbol{epsilon}sim Nleft(0,boldsymbol{I}sigma^{2}right)$, where $boldsymbol{I}$ is an $ntimes n$ identity matrix, where $n$ is the number of observations in your dataset. This implies $Var(boldsymbol{Y})=Varleft(boldsymbol{Xbeta}+boldsymbol{epsilon}right)=Var(boldsymbol{epsilon})=boldsymbol{I}sigma^{2}$.Then, under this model:

begin{eqnarray*} Varleft(hat{boldsymbol{beta}}right) & = & Varleft[boldsymbol{left(X^{prime}Xright)^{-}X^{prime}Y}right]\ & = & Varleft[boldsymbol{AY}right]\ & = & boldsymbol{A}Var(boldsymbol{Y})boldsymbol{A^{prime}}\ & = & boldsymbol{A}left(boldsymbol{I}sigma^{2}right)boldsymbol{A}^{prime}\ & = & left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}boldsymbol{X}^{prime}right]left(boldsymbol{I}sigma^{2}right)left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}boldsymbol{X}^{prime}right]^{prime}\ & = & sigma^{2}left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]left[boldsymbol{X}^{prime}boldsymbol{X}right]left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]\ & = & sigma^{2}boldsymbol{I}left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]\ & = & sigma^{2}left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-} end{eqnarray*}

Normally we don't know $sigma^{2}$ so it has to estimated from the residuals, $boldsymbol{hat{epsilon}}$ to give us: $Varleft(boldsymbol{hat{beta}}right)=hat{sigma}^{2}left(X^{prime}Xright)^{-}$. Without getting into all the gory details, the estimated $sigma^2$ is given by: $$hat{sigma}^{2}=frac{hat{epsilon}^{prime}hat{epsilon}}{n-r}$$ where $n$ is the number of rows in $boldsymbol{X}$ and $r$ is the rank of $boldsymbol{X}$.

Then, again, without going into all the distribution derivations and other gory details, the $100(1-alpha)%$ confidence interval for $hat{beta}_{j}$, $j=0,1,ldots,k,$ ($k=r-1$) is given by:

begin{eqnarray*} hat{beta}_{j} & pm & t_{alpha/2,n-r}hat{sigma}sqrt{g_{jj}} end{eqnarray*}

where $g_{jj}$ is the $j$th diagonal element of $left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}$, and $t_{alpha/2,n-r}$ is the $left(alpha/2right)100$ percentile of the $t$-distribution, with $n-r$ degrees of freedom.

Here is an example of the computations done manually and then checked against confidence intervals computed by R:

> library(MASS) >  > ##create some data > data("mtcars") > y<-mtcars$mpg > y  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 [27] 26.0 30.4 15.8 19.7 15.0 21.4 > x0<-rep(1, dim(mtcars)[1]) > x1<-mtcars$disp > x2<-mtcars$hp > x3<-mtcars$drat > X<-cbind(x0, x1, x2, x3) >  > ##print first rows of design matrix X > head(X)      x0  x1  x2   x3 [1,]  1 160 110 3.90 [2,]  1 160 110 3.90 [3,]  1 108  93 3.85 [4,]  1 258 110 3.08 [5,]  1 360 175 3.15 [6,]  1 225 105 2.76 >  > ##calculate beta hats - note:  not efficient, just for teaching > beta.hat<-ginv(t(X)%*%X)%*%t(X)%*%y > beta.hat             [,1] [1,] 19.34429256 [2,] -0.01923223 [3,] -0.03122932 [4,]  2.71497521 > ##get predicted values > y.hat <- X%*%beta.hat > residuals <- y-y.hat > ##find n and rank, r > r <- qr(X)$rank > n <- dim(X)[1] > ##compute covariance matrix > var.beta.hat <- as.numeric(t(residuals)%*%residuals/(n - r))*ginv(t(X)%*%X) > ##compute confidence interval > alpha <- .05 > t <-qt(1 - alpha/2, n - r) > conf.int <- round(cbind(beta.hat - t*sqrt(diag(var.beta.hat)), beta.hat + t*sqrt(diag(var.beta.hat))), 5) > conf.int          [,1]     [,2] [1,]  6.29413 32.39445 [2,] -0.03843 -0.00004 [3,] -0.05857 -0.00389 [4,] -0.33176  5.76171 >  > #fit model with standard R lm package > r.fit <-lm(y ~ x1 + x2 + x3) > #show confidence intervals in R > round(confint(r.fit, names(r.fit$coef), level = 0.95), 5)                2.5 %   97.5 % (Intercept)  6.29413 32.39445 x1          -0.03843 -0.00004 x2          -0.05857 -0.00389 x3          -0.33176  5.76171 

Update 1. to Assist With Your Additional Comment/Question About the Standard Error

@prony, so we are trying to calculate the standard error, so I'm not sure what you mean by the standard error is defined as $sigma/sqrt{n}.$ But I think I understand your question. To understand where $sqrt{g_{jj}}$ is coming from, first, notice that the standard error is just the square root of the variance of an estimator — in this case, $hat{boldsymbol{beta}}$. So really, all we are trying to do is obtain the square root of each of the diagonal elements of $Var(hat{boldsymbol{beta}}) = hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$. Here, $hat{boldsymbol{beta}}$ is a vector. So to obtain the standard error for a single $hat{boldsymbol{beta}}$, (what I call $hat{beta_j}$ and note this is not longer bolded as it's just a single element), we need to pick off the element corresponding to the $j$th row and $j$th column of the $hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$ matrix, and take the square root of it (this amounts to taking the square root of the diagonal entries of the matrix, which are the variances before square-rooting. The off-diagonal entries are the covariances). I have simply defined $sqrt{g_{jj}}$ to be equal to the $j$th diagonal element of the square matrix $(boldsymbol{X}^prime boldsymbol{X})^-$ so that when we obtain the square root of the diagonal element of $hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$ to obtain the standard error, we get $sqrt{hat{sigma}^2g_{jj}}=hat{sigma}sqrt{g_{jj}}.$

Similar Posts:

Rate this post

Leave a Comment