# Solved – Conditional expectation of R-squared

Consider the simple linear model:

\$\$pmb{y}=X'pmb{beta}+epsilon\$\$

where \$epsilon_isimmathrm{i.i.d.};mathcal{N}(0,sigma^2)\$ and
\$Xinmathbb{R}^{ntimes p}\$, \$pgeq2\$ and \$X\$ contains a column of
constants.

My question is, given \$mathrm{E}(X'X)\$, \$beta\$ and \$sigma\$, is there a formula
for a non trivial upper bound on \$mathrm{E}(R^2)\$*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting \$E(R^2)\$ itself would not be possible.

Contents

# EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on \$E(R^2)\$. Some numerical simulations (below) show that this bound is
actually pretty tight.

Stéphane Laurent derived the following: \$R^2simmathrm{B}(p-1,n-p,lambda)\$
where \$mathrm{B}(p-1,n-p,lambda)\$ is a non-central Beta distribution with
non-centrality parameter \$lambda\$ with

\$\$lambda=frac{||X'beta-mathrm{E}(X)'beta1_n||^2}{sigma^2}\$\$

So

\$\$mathrm{E}(R^2)=mathrm{E}left(frac{chi^2_{p-1}(lambda)}{chi^2_{p-1}(lambda)+chi^2_{n-p}}right)geqfrac{mathrm{E}left(chi^2_{p-1}(lambda)right)}{mathrm{E}left(chi^2_{p-1}(lambda)right)+mathrm{E}left(chi^2_{n-p}right)}\$\$

where \$chi^2_{k}(lambda)\$ is a non-central \$chi^2\$ with parameter \$lambda\$ and \$k\$ degrees of freedom. So a non-trivial upper bound for \$mathrm{E}(R^2)\$ is

\$\$frac{lambda+p-1}{lambda+n-1}\$\$

it is very tight (much tighter than what I had expected would be possible):

for example, using:

``rho<-0.75 p<-10 n<-25*p Su<-matrix(rho,p-1,p-1) diag(Su)<-1 su<-1 set.seed(123) bet<-runif(p) ``

the mean of the \$R^2\$ over 1000 simulations is `0.960819`. The theoretical upper bound above gives `0.9609081`. The bound seems to be equally precise across many values of \$R^2\$. Truly astounding!

# EDIT2:

after further research, it appears that the quality of the upper bound approximation to \$E(R^2)\$ will get better as \$lambda+p\$ increases (and all else equal, \$lambda\$ increases with \$n\$).

Any linear model can be written \$boxed{Y=mu+sigma G}\$ where \$G\$ has the standard normal distribution on \$mathbb{R}^n\$ and \$mu\$ is assumed to belong to a linear subspace \$W\$ of \$mathbb{R}^n\$. In your case \$W=text{Im}(X)\$.

Let \$[1] subset W\$ be the one-dimensional linear subspace generated by the vector \$(1,1,ldots,1)\$. Taking \$U=[1]\$ below, the \$R^2\$ is highly related to the classical Fisher statistic \$\$ F = frac{{Vert P_Z YVert}^2/(m-ell)}{{Vert P_W^perp YVert}^2/(n-m)}, \$\$ for the hypothesis test of \$H_0colon{mu in U}\$ where \$Usubset W\$ is a linear subspace, and denoting by \$Z=U^perp cap W\$ the orthogonal complement of \$U\$ in \$W\$, and denoting \$m=dim(W)\$ and \$ell=dim(U)\$ (then \$m=p\$ and \$ell=1\$ in your situation).

Indeed, \$\$ dfrac{{Vert P_Z YVert}^2}{{Vert P_W^perp YVert}^2} = frac{R^2}{1-R^2} \$\$ because the definition of \$R^2\$ is \$\$R^2 = frac{{Vert P_Z YVert}^2}{{Vert P_U^perp YVert}^2}=1 – frac{{Vert P^perp_W YVert}^2}{{Vert P_U^perp YVert}^2}.\$\$

Obviously \$boxed{P_Z Y = P_Z mu + sigma P_Z G}\$ and \$boxed{P_W^perp Y = sigma P_W^perp G}\$.

When \$H_0colon{mu in U}\$ is true then \$P_Z mu = 0\$ and therefore \$\$ F = frac{{Vert P_Z GVert}^2/(m-ell)}{{Vert P_W^perp GVert}^2/(n-m)} sim F_{m-ell,n-m} \$\$ has the Fisher \$F_{m-ell,n-m}\$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, \$R^2 sim {cal B}(m-ell, n-m)\$.

In the general situation we have to deal with \$P_Z Y = P_Z mu + sigma P_Z G\$ when \$P_Zmu neq 0\$. In this general case one has \${Vert P_Z YVert}^2 sim sigma^2chi^2_{m-ell}(lambda)\$, the noncentral \$chi^2\$ distribution with \$m-ell\$ degrees of freedom and noncentrality parameter \$boxed{lambda=frac{{Vert P_Z muVert}^2}{sigma^2}}\$, and then \$boxed{F sim F_{m-ell,n-m}(lambda)}\$ (noncentral Fisher distribution). This is the classical result used to compute power of \$F\$-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally \$R^2\$ has the noncentral beta distribution with "shape parameters" \$m-ell\$ and \$n-m\$ and noncentrality parameter \$lambda\$. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down \$P_Zmu\$. Note that \$P_Z = P_W – P_U\$. One has \$P_U mu = barmu 1\$ when \$U=[1]\$, and \$P_W mu = mu\$. Hence \$P_Z mu =mu – barmu 1\$ where here \$mu=Xbeta\$ for the unknown parameters vector \$beta\$.

Rate this post