Solved – Can Adjusted R squared be equal to 1

I have a dataset with around 15 independent variables. I am using a multi-regression model to fit the dataset. For model selection, I am using a backward elimination procedure based on the p-values. The adjusted R^2 for the model with all predictors is exactly 1. At this point, I concluded that maybe the model is also picking up noise. But, based on the model selection I removed 5 predictor variables and still the adjusted R^2 is 1. I am not sure if this correct or I am just modeling noise. Can someone comment on this?

Dan and Michael point out the relevant issues. Just for completeness, the relationship between adjusted $R^2$ and $R^2$ is given by (see, e.g., here)

$$ R^2_{adjusted}=1-(1-R^2)frac{n-1}{n-K}, $$ (with $K$ the number of regressors, including the constant). This shows that $R^2_{adjusted}=1$ if $R^2=1$, unless (see below) $K=n$.

$R^2=1$ occurs when all residuals $hat u_i=y_i-hat y_i$ are zero, as $$ R^2=1-frac{hat{u}'hat{u}/n}{tilde{y}'tilde{y}/n}. $$ Here, $hat u$ denotes the vector of residuals and $tilde y$ the vector of demeaned observations on the dependent variable.

Dan discusses one reason to get an $R^2$ of 1. Another is to have as many regressors as observations, i.e., $K=n$.

Technically, this is because the $ntimes K$ regressor matrix $X$ then is square. The OLS estimator $hatbeta=(X'X)^{-1}X'y$ can then be written as (assuming no exact multicollinearity) $$ hatbeta=(X'X)^{-1}X'y=X^{-1}{X'}^{-1}X'y=X^{-1}y $$ so that the fitted values $hat y=Xhatbeta$ are just $hat y=XX^{-1}y=y$, so that all residuals are zero.

Here is an illustration using artificial data (code below), in which regressors are generated totally independently of $y$, and yet we achieve an $R^2$ of 1 once we have as many of them as we have observations.

Code:

n <- 15 regressors <- n-1 # enough, as we'll also fit a constant y <- rnorm(n) X <- matrix(rnorm(regressors*n),ncol=regressors)  collectionR2s <- rep(NA,regressors) for (i in 1:regressors){   collectionR2s[i] <- summary(lm(y~X[,1:i]))$r.squared } plot(1:regressors,collectionR2s,col="purple",pch=19,type="b",lwd=2) abline(h=1, lty=2) 

When $K=n$, R however, correctly, does not report an adjusted $R^2$:

> summary(lm(y~X))  Call: lm(formula = y ~ X)  Residuals: ALL 15 residuals are 0: no residual degrees of freedom!  Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)  2.36296         NA      NA       NA X1          -1.09003         NA      NA       NA X2           0.39177         NA      NA       NA X3           0.19273         NA      NA       NA X4           0.51528         NA      NA       NA X5          -0.04530         NA      NA       NA X6          -1.28539         NA      NA       NA X7          -0.72770         NA      NA       NA X8          -0.14604         NA      NA       NA X9           0.34385         NA      NA       NA X10         -0.93811         NA      NA       NA X11          2.23064         NA      NA       NA X12          0.06744         NA      NA       NA X13          0.21220         NA      NA       NA X14         -2.29134         NA      NA       NA  Residual standard error: NaN on 0 degrees of freedom Multiple R-squared:      1, Adjusted R-squared:    NaN  F-statistic:   NaN on 14 and 0 DF,  p-value: NA 

Similar Posts:

Rate this post

Leave a Comment