I read that in a regression with $k$ regressors, the t-statistic corresponding to a certain coefficient follows a $t(n-k)$ distribution. However, later on I read that studentized residuals follow a $t(n-k-1)$ distribution. How could it be that the extra degree of freedom is lost: isn't this also just based on a regular t-statistic?

**Contents**hide

#### Best Answer

In my opinion there are two possible explanations here:

Externally studentized residuals are based on data with one observation deleted, this may account for the loss of the single degree of freedom.

There is inconsistency across books what $“k"$ actually refers to in a multiple regression model. If $k$ is the number of regressors, then the correct degrees of freedom is $n-k-1$. On the other hand, if $k$ is the number of regression coefficients (which is usually the number of regressors plus one, for the intercept), then the correct degrees of freedom is $n-k.$

Note: In general, the distribution of studentized residuals doesn't depend on whether there are dummy variables in the model or not. To be clear, let the regression model be $Y=Xbeta +epsilon$, where $X in R^{n times (k+1)}$, where $n$ is the number of observations and $k$ is the number of regressors. The design matrix $X$ may contain continuous variables, dummy variables and/or both. In this general framework, the *internally Studentized residual* is defined as

$$ r_i = frac{e_i}{MSE(1-h_{ii})} $$ where $e_i$ is the $i^{th}$ residual, $H=(h_{ij})=(X'X)^-X'Y$ is the so-called "hat" matrix. The internally Studentized residuals do **not** follow a $t$-distribution, because $e_i$ and $MSE$ are not independent.

The *externally Studentized residual* is defined as

$$ t_i = frac{e_i}{MSE_{(i)}(1-h_{ii})} $$ where $MSE_{(i)}$ is the mean-square error from the regression model fitted with the $i^{th}$ observation deleted. In this case, $e_i$ and $MSE_{(i)}$ are independent, and it can be shown that $t_i sim t_{n-k-2}$, the loss of the extra one degree of freedom due to the deletion of observation $i$.

I hope this makes it clearer. So to understand the degrees of freedom in your case, you should consider the design matrix as a whole, and not break in into two parts – one with continuous predictors and one with dummy(s). Once you do that, figure out if the Studentized residuals in questions are externally or internally studentized. Then apply the above.

### Similar Posts:

- Solved – Studentized residual distribution
- Solved – Studentized residual distribution
- Solved – What’s the difference between standardization and studentization
- Solved – What advantages do “internally studentized residuals” offer over raw estimated residuals in terms of diagnosing potential influential datapoints
- Solved – What advantages do “internally studentized residuals” offer over raw estimated residuals in terms of diagnosing potential influential datapoints