Let's consider a multiple linear regression formula:
$ hat{y} = beta_0 + beta_1 hat{x}_1 + beta_2 hat{x}_2 $ (1)
which produces adjusted $R^2 = r_1$.
Now I want to add to one predictor to the (1) which turns into:
$ hat{y} = beta_0 + beta_1 hat{x}_1 + beta_2 hat{x}_2 + beta_3 hat{x}_3 $ (2)
which produces adjusted $R^2 = r_2$.
If the data fed into (1) and (2) are exactly the same, is there any way to explain $r_2 < r_1$ apart from a code bug?
Best Answer
Yes, it's definitely possible for adjusted $R^2$ to decrease when you add parameters.
Ordinary $R^2$ can't decrease, but adjusted-$R^2$ certainly can. We can write the relationship between the two like so:
$R_{adj}^2 = R^2-(1-R^2)frac{p}{n-p-1}$
Note that both terms in the product $(1-R^2)cdotfrac{p}{n-p-1}$ are positive (unless $R^2=1$), so if $R^2<1$, $R_{adj}^2 < R^2$.
If $R^2<frac{p}{n-1}$, then adjusted-$R^2$ will be negative.
$R_{adj}^2$ will decrease if the $R^2$ for a model with an additional term if the second model's $R^2$ didn't increase from that for the first model by at least as much as would be expected for an unrelated variable.
We can see this happen quite easily: I just generated three unrelated variables in R (via)
x1=rnorm(20);x2=rnorm(20);y=rnorm(20)
If we fit a linear regression with just the first $x$ (
lm(y~x1)
), the adjusted $R^2$ is smaller than with the null model (which is 0):
Multiple R-squared: 0.0007048, Adjusted R-squared: -0.05481
If we fit both independent variables (
lm(y~x1+x2)
), the adjusted $R^2$ goes down again (and the $R^2$ – necessarily – goes up):
Multiple R-squared: 0.00199, Adjusted R-squared: -0.1154
For adjusted $R^2$ to increase, its addition has to explain more additional variation in the data than would be expected from an unrelated variable; it's possible for an unrelated variable to add less than it would be expected to, just by chance.
Similar Posts:
- Solved – What happens when we introduce more variables to a linear regression model
- Solved – How low multiple R-squared value is enough to reject a model
- Solved – R-squared or adjusted R-squared to use when comparing nested models
- Solved – Interpreting adjusted R-squared of a log transformed regression model
- Solved – High t-stat for intercept