Solved – How is adjusted coefficient of determination ($R^2$) linked to the F-values of a test against zero when adding a new variable

Someone claims that the adjusted $R^2$ will increase with the addition of an extra variable.
I wonder why, as it is called adjusted (in contrast to the normal $R^2$).

The only condition it has to satisfy (to increase the adjusted $R^2$) is that the F-value (by the way, how is it simple to calculate it?) of the null hypothesis that the new variable is greater than 1.

Can someone give me a hint where the links between the adjusted $R^2$ and the F-Stat of that test are?

And however, who wants to include a new variable in a multiple OLS regression model anyway, if the beta was tested to be 0? Therefore adjusted $R^2$ always changes.

The assertion of the question is true. We usually show the inverse situation, i.e. the case of dropping one variable. In a linear multiple regression model $y_i = Xbeta +u_i,; i=1,…,n$, with $k$ regressors (including the constant term), if the t-ratio $t$ of a variable is less than 1, then dropping this one variable will increase adjusted R_squared, $bar R^2$. When dealing with dropping one variable, then the corresponding F-statistic (reflecting just one linear restriction) is equal to $t^2$ (see this post). So both should be smaller than unity for $bar R^2$ to increase. This result can be proven as follows: $bar R^2$ is defined as

$$ 1- bar R^2 = frac {n-1}{n-k} (1-R^2) qquad [1]$$

Denoting $S_{yy} = sum_{i=1}^{n}(y_i-bar y)^2$ and since $R^2 = 1 – frac{sum_{i=1}^{n}hat u_i^2}{S_{yy}}$we can write

$$ (1- bar R^2) = frac {n-1}{n-k} left(1-1 + frac{sum_{i=1}^{n}hat u_i^2}{S_{yy}}right) = frac {n-1}{S_{yy}} frac{sum_{i=1}^{n}hat u_i^2}{n-k}$$

$$Rightarrow (1- bar R^2)frac {S_{yy}}{n-1} = hat sigma^2 qquad [2]$$

By dropping a regressor, $S_{yy}$ and $n$ remain unaffected. So as a matter of mathematical necessity, the term $(1- bar R^2)$ in the LHS of $[2]$ moves in the same direction as its RHS – meaning that as the OLS estimated variance of the regresion decreases, so is $(1-bar R^2)$, and hence, $bar R^2$ increases as $hat sigma^2$ decreases.

Consider now dropping one regressor, and index the various quantities related to this restricted regression with $r$. Denote $RSS$ the residuals sum of squares

The F-statistic to test whether the restricted regression with $k-1$ regressors is "better" than the regression with $k$ regressors is

$$F(1,n-k)= frac{RSS_r -RSS}{RSS/(n-k)} = frac {(n-k+1)hat sigma_r^2 – (n-k)hat sigma^2}{hat sigma^2}$$

$$=(n-k+1)frac {hat sigma_r^2}{hat sigma^2} – (n-k) Rightarrow frac {hat sigma_r^2}{hat sigma^2} = frac {F+(n-k)}{1+(n-k)} qquad [2]$$

From $[2]$ it is obvious that if

$$F<1 Rightarrow hat sigma_r^2 <hat sigma^2 Rightarrow bar R_r^2 > bar R^2$$

And $F(1,n-k)=t^2 <1 Rightarrow t<1$

Beware that the above results hold only when considering dropping just one regressor. Assume that we run the initial regression with $k$ regressors and we observe that two of them have t-ratios smaller than unity. This does not imply necessarily that if we drop both simultaneously, we will end up with a higher $bar R^2$.

Now think in reverse – start from the "restricted" model and add one variable.

Similar Posts:

Rate this post

Leave a Comment