I am curious about the statement made at the bottom of the first page in this text regarding the $R^2_mathrm{adjusted}$ adjustment
$$R^2_mathrm{adjusted} =1-(1-R^2)left({frac{n-1}{n-m-1}}right).$$
The text states:
The logic of the adjustment is the following: in ordinary multiple regression, a random predictor explains on average a proportion $1/(n – 1)$ of the response’s variation, so that $m$ random predictors explain together, on average, $m/(n – 1)$ of the response’s variation; in other words, the expected value of $R^2$ is $mathbb{E}(R^2) = m/(n – 1)$. Applying the [$R^2_mathrm{adjusted}$] formula to that value, where all predictors are random, gives $R^2_mathrm{adjusted} = 0$."
This seems to be a very simple and interpretable motivation for $R^2_mathrm{adjusted}$. However, I have not been able to work out that $mathbb{E}(R^2)=1/(n – 1)$ for single random (i.e. uncorrelated) predictor.
Could someone point me in the right direction here?
Best Answer
This is accurate mathematical statistics. See this post for the derivation of the distribution of $R^2$ under the hypothesis that all regressors (bar the constant term) are uncorrelated with the dependent variable ("random predictors").
This distribution is a Beta, with $m$ being the number of predictors without counting the constant term, and $n$ the sample size,
$$R^2 sim Betaleft (frac {m}{2}, frac {n-m-1}{2}right)$$
and so
$$E(R^2) = frac {m/2}{(m/2)+[(n-m-1)/2]} = frac{m}{n-1}$$
This appears to be a clever way to "justify" the logic behind the adjusted $R^2$: if indeed all regressors are uncorrelated, then the adjusted $R^2$ is "on average" zero.
Similar Posts:
- Solved – Estimate signal-to-noise ratios by regression
- Solved – Why is an unbiased random walk non-ergodic
- Solved – Proof that if covariance is zero then there is no linear relationship
- Solved – Mean and variance of the maximum of a random number of Uniform variables
- Solved – Convergence in probability, $X_i$ IID with finite second moment