# Solved – Standard Error and confidence interval for multiple linear regression in matrix notation

Lets say we are trying to fit a normal linear model to our data as follows:

$$y = beta _0 + x_1beta_1 + x_2beta_2 + … + x_pbeta_p + epsilon$$.

My question is that how we can derive the standard error and confidence interval for each of the coefficient in matrix notation.

Contents

Under the usual Normal Gauss-Markov Model where $$boldsymbol{Y}=boldsymbol{Xbeta}+boldsymbol{epsilon}$$, it is assummed $$boldsymbol{epsilon}sim Nleft(0,boldsymbol{I}sigma^{2}right)$$, where $$boldsymbol{I}$$ is an $$ntimes n$$ identity matrix, where $$n$$ is the number of observations in your dataset. This implies $$Var(boldsymbol{Y})=Varleft(boldsymbol{Xbeta}+boldsymbol{epsilon}right)=Var(boldsymbol{epsilon})=boldsymbol{I}sigma^{2}$$.Then, under this model:

$$begin{eqnarray*} Varleft(hat{boldsymbol{beta}}right) & = & Varleft[boldsymbol{left(X^{prime}Xright)^{-}X^{prime}Y}right]\ & = & Varleft[boldsymbol{AY}right]\ & = & boldsymbol{A}Var(boldsymbol{Y})boldsymbol{A^{prime}}\ & = & boldsymbol{A}left(boldsymbol{I}sigma^{2}right)boldsymbol{A}^{prime}\ & = & left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}boldsymbol{X}^{prime}right]left(boldsymbol{I}sigma^{2}right)left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}boldsymbol{X}^{prime}right]^{prime}\ & = & sigma^{2}left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]left[boldsymbol{X}^{prime}boldsymbol{X}right]left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]\ & = & sigma^{2}boldsymbol{I}left[left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}right]\ & = & sigma^{2}left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-} end{eqnarray*}$$

Normally we don't know $$sigma^{2}$$ so it has to estimated from the residuals, $$boldsymbol{hat{epsilon}}$$ to give us: $$Varleft(boldsymbol{hat{beta}}right)=hat{sigma}^{2}left(X^{prime}Xright)^{-}$$. Without getting into all the gory details, the estimated $$sigma^2$$ is given by: $$hat{sigma}^{2}=frac{hat{epsilon}^{prime}hat{epsilon}}{n-r}$$ where $$n$$ is the number of rows in $$boldsymbol{X}$$ and $$r$$ is the rank of $$boldsymbol{X}$$.

Then, again, without going into all the distribution derivations and other gory details, the $$100(1-alpha)%$$ confidence interval for $$hat{beta}_{j}$$, $$j=0,1,ldots,k,$$ ($$k=r-1$$) is given by:

$$begin{eqnarray*} hat{beta}_{j} & pm & t_{alpha/2,n-r}hat{sigma}sqrt{g_{jj}} end{eqnarray*}$$

where $$g_{jj}$$ is the $$j$$th diagonal element of $$left(boldsymbol{X}^{prime}boldsymbol{X}right)^{-}$$, and $$t_{alpha/2,n-r}$$ is the $$left(alpha/2right)100$$ percentile of the $$t$$-distribution, with $$n-r$$ degrees of freedom.

Here is an example of the computations done manually and then checked against confidence intervals computed by R:

``> library(MASS) >  > ##create some data > data("mtcars") > y<-mtcars$$mpg > y  21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3  26.0 30.4 15.8 19.7 15.0 21.4 > x0<-rep(1, dim(mtcars)) > x1<-mtcars$$disp > x2<-mtcars$$hp > x3<-mtcars$$drat > X<-cbind(x0, x1, x2, x3) >  > ##print first rows of design matrix X > head(X)      x0  x1  x2   x3 [1,]  1 160 110 3.90 [2,]  1 160 110 3.90 [3,]  1 108  93 3.85 [4,]  1 258 110 3.08 [5,]  1 360 175 3.15 [6,]  1 225 105 2.76 >  > ##calculate beta hats - note:  not efficient, just for teaching > beta.hat<-ginv(t(X)%*%X)%*%t(X)%*%y > beta.hat             [,1] [1,] 19.34429256 [2,] -0.01923223 [3,] -0.03122932 [4,]  2.71497521 > ##get predicted values > y.hat <- X%*%beta.hat > residuals <- y-y.hat > ##find n and rank, r > r <- qr(X)$$rank > n <- dim(X) > ##compute covariance matrix > var.beta.hat <- as.numeric(t(residuals)%*%residuals/(n - r))*ginv(t(X)%*%X) > ##compute confidence interval > alpha <- .05 > t <-qt(1 - alpha/2, n - r) > conf.int <- round(cbind(beta.hat - t*sqrt(diag(var.beta.hat)), beta.hat + t*sqrt(diag(var.beta.hat))), 5) > conf.int [,1] [,2] [1,] 6.29413 32.39445 [2,] -0.03843 -0.00004 [3,] -0.05857 -0.00389 [4,] -0.33176 5.76171 > > #fit model with standard R lm package > r.fit <-lm(y ~ x1 + x2 + x3) > #show confidence intervals in R > round(confint(r.fit, names(r.fit$$coef), level = 0.95), 5)                2.5 %   97.5 % (Intercept)  6.29413 32.39445 x1          -0.03843 -0.00004 x2          -0.05857 -0.00389 x3          -0.33176  5.76171 ``

@prony, so we are trying to calculate the standard error, so I'm not sure what you mean by the standard error is defined as $$sigma/sqrt{n}.$$ But I think I understand your question. To understand where $$sqrt{g_{jj}}$$ is coming from, first, notice that the standard error is just the square root of the variance of an estimator — in this case, $$hat{boldsymbol{beta}}$$. So really, all we are trying to do is obtain the square root of each of the diagonal elements of $$Var(hat{boldsymbol{beta}}) = hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$$. Here, $$hat{boldsymbol{beta}}$$ is a vector. So to obtain the standard error for a single $$hat{boldsymbol{beta}}$$, (what I call $$hat{beta_j}$$ and note this is not longer bolded as it's just a single element), we need to pick off the element corresponding to the $$j$$th row and $$j$$th column of the $$hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$$ matrix, and take the square root of it (this amounts to taking the square root of the diagonal entries of the matrix, which are the variances before square-rooting. The off-diagonal entries are the covariances). I have simply defined $$sqrt{g_{jj}}$$ to be equal to the $$j$$th diagonal element of the square matrix $$(boldsymbol{X}^prime boldsymbol{X})^-$$ so that when we obtain the square root of the diagonal element of $$hat{sigma}^2(boldsymbol{X}^prime boldsymbol{X})^-$$ to obtain the standard error, we get $$sqrt{hat{sigma}^2g_{jj}}=hat{sigma}sqrt{g_{jj}}.$$