Solved – Studentized residual distribution

I read that in a regression with $k$ regressors, the t-statistic corresponding to a certain coefficient follows a $t(n-k)$ distribution. However, later on I read that studentized residuals follow a $t(n-k-1)$ distribution. How could it be that the extra degree of freedom is lost: isn't this also just based on a regular t-statistic?

In my opinion there are two possible explanations here:

  1. Externally studentized residuals are based on data with one observation deleted, this may account for the loss of the single degree of freedom.

  2. There is inconsistency across books what $“k"$ actually refers to in a multiple regression model. If $k$ is the number of regressors, then the correct degrees of freedom is $n-k-1$. On the other hand, if $k$ is the number of regression coefficients (which is usually the number of regressors plus one, for the intercept), then the correct degrees of freedom is $n-k.$

Note: In general, the distribution of studentized residuals doesn't depend on whether there are dummy variables in the model or not. To be clear, let the regression model be $Y=Xbeta +epsilon$, where $X in R^{n times (k+1)}$, where $n$ is the number of observations and $k$ is the number of regressors. The design matrix $X$ may contain continuous variables, dummy variables and/or both. In this general framework, the internally Studentized residual is defined as
$$ r_i = frac{e_i}{MSE(1-h_{ii})} $$ where $e_i$ is the $i^{th}$ residual, $H=(h_{ij})=(X'X)^-X'Y$ is the so-called "hat" matrix. The internally Studentized residuals do not follow a $t$-distribution, because $e_i$ and $MSE$ are not independent.

The externally Studentized residual is defined as
$$ t_i = frac{e_i}{MSE_{(i)}(1-h_{ii})} $$ where $MSE_{(i)}$ is the mean-square error from the regression model fitted with the $i^{th}$ observation deleted. In this case, $e_i$ and $MSE_{(i)}$ are independent, and it can be shown that $t_i sim t_{n-k-2}$, the loss of the extra one degree of freedom due to the deletion of observation $i$.

I hope this makes it clearer. So to understand the degrees of freedom in your case, you should consider the design matrix as a whole, and not break in into two parts – one with continuous predictors and one with dummy(s). Once you do that, figure out if the Studentized residuals in questions are externally or internally studentized. Then apply the above.

Similar Posts:

Rate this post

Leave a Comment