# Solved – Can we find bounds on R-squared

We know that as the number of independent variables increases, the coefficient of determination \$R^2\$ will increase but the adjusted \$R^2\$ may or may not increase. In the following question for the sake of simplicity I shall write only \$R^2\$ but it must be understood that the question applies to both \$R^2\$ and adjusted \$R^2\$. Further we shall make life easier by assuming all the conditions and assumptions of multiple regression are satisfied.

Question: Consider a multiple regression where the the dependent variable \$y\$ depends on at most \$n\$ independent variables \$x_1, x_2, ldots, x_n\$. For a given \$k\$, \$1 le k le n\$, we find best \$k\$ variable liner fit for \$y\$. Let us denote the coefficient of determination of this best fit by \$R_{max}^2(k)\$.

Similarly we find the worst possible \$k\$ variable fit and we denote its coefficient of determination by \$R_{min}^2(k)\$.

Trivially we have the following bounds

\$\$
R_{min}^2(k) ge R_{min}^2(1)
\$\$
and
\$\$
R_{max}^2(k) le R_{max}^2(n) = R^2(n).
\$\$

My question is can we find improve and express the above bounds in terms of non trivial functions involving \$n\$, \$k\$, \$R_{min}^2(1)\$ and \$R_{max}^2(n)\$. Is any additional assumptions is required to obtain such non trivial bounds?

Motivation: I am currently working on linear modeling where I have a large number of independent variables and I need a way to determine how small or large the coefficient of determination will be for a given \$k\$. Currently I am following a various algorithmic approaches and writing a programs that gives above bounds. This method is not much useful because despite using the best known algorithms such as leaps computation takes a lot of time as the number of variables increases. Therefore I want to see if a theoretical bound is possible.

My progress so far: Based on heuristic data I have generated using a computer program, I find that \$R_{max}^2(k)\$ approximately follows a logistic model

\$\$
R_{max}^2(k) approx frac{R^2(n)}{1+ae^{-bk}}
\$\$
where \$a\$ and \$b\$ are some local constants which depends on the data being analyzed.

Contents