Consider the simple linear model:

$$pmb{y}=X'pmb{beta}+epsilon$$

where $epsilon_isimmathrm{i.i.d.};mathcal{N}(0,sigma^2)$ and

$Xinmathbb{R}^{ntimes p}$, $pgeq2$ and $X$ contains a column of

constants.

My question is, given $mathrm{E}(X'X)$, $beta$ and $sigma$, is there a formula

for a non trivial upper bound on $mathrm{E}(R^2)$*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting $E(R^2)$ itself would not be possible.

**Contents**hide

# EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on $E(R^2)$. Some numerical simulations (below) show that this bound is

actually pretty tight.

Stéphane Laurent derived the following: $R^2simmathrm{B}(p-1,n-p,lambda)$

where $mathrm{B}(p-1,n-p,lambda)$ is a non-central Beta distribution with

non-centrality parameter $lambda$ with

$$lambda=frac{||X'beta-mathrm{E}(X)'beta1_n||^2}{sigma^2}$$

So

$$mathrm{E}(R^2)=mathrm{E}left(frac{chi^2_{p-1}(lambda)}{chi^2_{p-1}(lambda)+chi^2_{n-p}}right)geqfrac{mathrm{E}left(chi^2_{p-1}(lambda)right)}{mathrm{E}left(chi^2_{p-1}(lambda)right)+mathrm{E}left(chi^2_{n-p}right)}$$

where $chi^2_{k}(lambda)$ is a non-central $chi^2$ with parameter $lambda$ and $k$ degrees of freedom. So a non-trivial upper bound for $mathrm{E}(R^2)$ is

$$frac{lambda+p-1}{lambda+n-1}$$

it is *very* tight (much tighter than what I had expected would be possible):

for example, using:

`rho<-0.75 p<-10 n<-25*p Su<-matrix(rho,p-1,p-1) diag(Su)<-1 su<-1 set.seed(123) bet<-runif(p) `

the mean of the $R^2$ over 1000 simulations is `0.960819`

. The theoretical upper bound above gives `0.9609081`

. The bound seems to be equally precise across many values of $R^2$. Truly astounding!

# EDIT2:

after further research, it appears that the quality of the upper bound approximation to $E(R^2)$ will get better as $lambda+p$ increases (and all else equal, $lambda$ increases with $n$).

#### Best Answer

Any linear model can be written $boxed{Y=mu+sigma G}$ where $G$ has the standard normal distribution on $mathbb{R}^n$ and $mu$ is assumed to belong to a linear subspace $W$ of $mathbb{R}^n$. In your case $W=text{Im}(X)$.

Let $[1] subset W$ be the one-dimensional linear subspace generated by the vector $(1,1,ldots,1)$. Taking $U=[1]$ below, the $R^2$ is highly related to the classical Fisher statistic $$ F = frac{{Vert P_Z YVert}^2/(m-ell)}{{Vert P_W^perp YVert}^2/(n-m)}, $$ for the hypothesis test of $H_0colon{mu in U}$ where $Usubset W$ is a linear subspace, and denoting by $Z=U^perp cap W$ the orthogonal complement of $U$ in $W$, and denoting $m=dim(W)$ and $ell=dim(U)$ (then $m=p$ and $ell=1$ in your situation).

Indeed, $$ dfrac{{Vert P_Z YVert}^2}{{Vert P_W^perp YVert}^2} = frac{R^2}{1-R^2} $$ because the definition of $R^2$ is $$R^2 = frac{{Vert P_Z YVert}^2}{{Vert P_U^perp YVert}^2}=1 – frac{{Vert P^perp_W YVert}^2}{{Vert P_U^perp YVert}^2}.$$

Obviously $boxed{P_Z Y = P_Z mu + sigma P_Z G}$ and $boxed{P_W^perp Y = sigma P_W^perp G}$.

**When $H_0colon{mu in U}$ is true** then $P_Z mu = 0$ and therefore $$ F = frac{{Vert P_Z GVert}^2/(m-ell)}{{Vert P_W^perp GVert}^2/(n-m)} sim F_{m-ell,n-m} $$ has the Fisher $F_{m-ell,n-m}$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, $R^2 sim {cal B}(m-ell, n-m)$.

**In the general situation** we have to deal with $P_Z Y = P_Z mu + sigma P_Z G$ when $P_Zmu neq 0$. In this general case one has ${Vert P_Z YVert}^2 sim sigma^2chi^2_{m-ell}(lambda)$, the noncentral $chi^2$ distribution with $m-ell$ degrees of freedom and noncentrality parameter $boxed{lambda=frac{{Vert P_Z muVert}^2}{sigma^2}}$, and then $boxed{F sim F_{m-ell,n-m}(lambda)}$ (noncentral Fisher distribution). This is the classical result used to compute power of $F$-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally $R^2$ has the noncentral beta distribution with "shape parameters" $m-ell$ and $n-m$ and noncentrality parameter $lambda$. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down $P_Zmu$. Note that $P_Z = P_W – P_U$. One has $P_U mu = barmu 1$ when $U=[1]$, and $P_W mu = mu$. Hence $P_Z mu =mu – barmu 1$ where here $mu=Xbeta$ for the unknown parameters vector $beta$.

### Similar Posts:

- Solved – Calculating power function for ANOVA
- Solved – Closed form solution to lasso problem when data matrix is diagonal
- Solved – Why is Lasso penalty equivalent to the double exponential (Laplace) prior
- Solved – Why is Lasso penalty equivalent to the double exponential (Laplace) prior
- Solved – Sum of truncated normal distribution