Someone claims that the adjusted $R^2$ will increase with the addition of an extra variable.
I wonder why, as it is called adjusted (in contrast to the normal $R^2$).
The only condition it has to satisfy (to increase the adjusted $R^2$) is that the F-value (by the way, how is it simple to calculate it?) of the null hypothesis that the new variable is greater than 1.
Can someone give me a hint where the links between the adjusted $R^2$ and the F-Stat of that test are?
And however, who wants to include a new variable in a multiple OLS regression model anyway, if the beta was tested to be 0? Therefore adjusted $R^2$ always changes.
Best Answer
The assertion of the question is true. We usually show the inverse situation, i.e. the case of dropping one variable. In a linear multiple regression model $y_i = Xbeta +u_i,; i=1,…,n$, with $k$ regressors (including the constant term), if the t-ratio $t$ of a variable is less than 1, then dropping this one variable will increase adjusted R_squared, $bar R^2$. When dealing with dropping one variable, then the corresponding F-statistic (reflecting just one linear restriction) is equal to $t^2$ (see this post). So both should be smaller than unity for $bar R^2$ to increase. This result can be proven as follows: $bar R^2$ is defined as
$$ 1- bar R^2 = frac {n-1}{n-k} (1-R^2) qquad [1]$$
Denoting $S_{yy} = sum_{i=1}^{n}(y_i-bar y)^2$ and since $R^2 = 1 – frac{sum_{i=1}^{n}hat u_i^2}{S_{yy}}$we can write
$$ (1- bar R^2) = frac {n-1}{n-k} left(1-1 + frac{sum_{i=1}^{n}hat u_i^2}{S_{yy}}right) = frac {n-1}{S_{yy}} frac{sum_{i=1}^{n}hat u_i^2}{n-k}$$
$$Rightarrow (1- bar R^2)frac {S_{yy}}{n-1} = hat sigma^2 qquad [2]$$
By dropping a regressor, $S_{yy}$ and $n$ remain unaffected. So as a matter of mathematical necessity, the term $(1- bar R^2)$ in the LHS of $[2]$ moves in the same direction as its RHS – meaning that as the OLS estimated variance of the regresion decreases, so is $(1-bar R^2)$, and hence, $bar R^2$ increases as $hat sigma^2$ decreases.
Consider now dropping one regressor, and index the various quantities related to this restricted regression with $r$. Denote $RSS$ the residuals sum of squares
The F-statistic to test whether the restricted regression with $k-1$ regressors is "better" than the regression with $k$ regressors is
$$F(1,n-k)= frac{RSS_r -RSS}{RSS/(n-k)} = frac {(n-k+1)hat sigma_r^2 – (n-k)hat sigma^2}{hat sigma^2}$$
$$=(n-k+1)frac {hat sigma_r^2}{hat sigma^2} – (n-k) Rightarrow frac {hat sigma_r^2}{hat sigma^2} = frac {F+(n-k)}{1+(n-k)} qquad [2]$$
From $[2]$ it is obvious that if
$$F<1 Rightarrow hat sigma_r^2 <hat sigma^2 Rightarrow bar R_r^2 > bar R^2$$
And $F(1,n-k)=t^2 <1 Rightarrow t<1$
Beware that the above results hold only when considering dropping just one regressor. Assume that we run the initial regression with $k$ regressors and we observe that two of them have t-ratios smaller than unity. This does not imply necessarily that if we drop both simultaneously, we will end up with a higher $bar R^2$.
Now think in reverse – start from the "restricted" model and add one variable.
Similar Posts:
- Solved – How to interpret a negative adjusted R-squared
- Solved – is there a formula for an innovation outlier regressor in time series intervention analysis
- Solved – Strict exogeneity and lagged variables
- Solved – R-squared or adjusted R-squared to use when comparing nested models
- Solved – Can Adjusted R squared be equal to 1