Solved – Expected value of $R^2$, the coefficient of determination, under the null hypothesis

I am curious about the statement made at the bottom of the first page in this text regarding the $R^2_mathrm{adjusted}$ adjustment

$$R^2_mathrm{adjusted} =1-(1-R^2)left({frac{n-1}{n-m-1}}right).$$

The text states:

The logic of the adjustment is the following: in ordinary multiple regression, a random predictor explains on average a proportion $1/(n – 1)$ of the response’s variation, so that $m$ random predictors explain together, on average, $m/(n – 1)$ of the response’s variation; in other words, the expected value of $R^2$ is $mathbb{E}(R^2) = m/(n – 1)$. Applying the [$R^2_mathrm{adjusted}$] formula to that value, where all predictors are random, gives $R^2_mathrm{adjusted} = 0$."

This seems to be a very simple and interpretable motivation for $R^2_mathrm{adjusted}$. However, I have not been able to work out that $mathbb{E}(R^2)=1/(n – 1)$ for single random (i.e. uncorrelated) predictor.

Could someone point me in the right direction here?

This is accurate mathematical statistics. See this post for the derivation of the distribution of $R^2$ under the hypothesis that all regressors (bar the constant term) are uncorrelated with the dependent variable ("random predictors").

This distribution is a Beta, with $m$ being the number of predictors without counting the constant term, and $n$ the sample size,

$$R^2 sim Betaleft (frac {m}{2}, frac {n-m-1}{2}right)$$

and so

$$E(R^2) = frac {m/2}{(m/2)+[(n-m-1)/2]} = frac{m}{n-1}$$

This appears to be a clever way to "justify" the logic behind the adjusted $R^2$: if indeed all regressors are uncorrelated, then the adjusted $R^2$ is "on average" zero.

Similar Posts:

Rate this post

Leave a Comment