# Solved – Why does Mean Square Error decreases when I fit higher degree of polynomial

I understand the intuition of polynomial regression with higher degree are able to fit the data better, as it is able to decrease your bias but increases the variance. However, I am unable to proof them mathematically. Can anyone help to explain to me why this might be true, i.e it decreases \$|| Y – X beta ||^2 \$ ? Thank you!

Contents

Consider your matrix \$X\$ consists of one column vector \$x_1\$. That is, a single feature.

In linear regression, \$beta=(X^TX)^{-1}X^Ty=(x_1^Tx_1)^{-1}x_1^Ty\$.

If you add a new variable \$x_2\$ of degree 2, obtaining \$X_2\$, then \$\$ y-X_2(X_2^TX_2)^{-1}X_2^Ty = y-x_1(x_1^Tx_1)^{-1}x_1^Ty -x_2(x_2^Tx_2)^{-1}x_2^Ttilde y_1 \$\$ where \$tilde y_1 = y-x_1(x_1^Tx_1)^{-1}x_1^Ty\$.

For any matrix \$X\$, \$pi_X(y)=X(X^TX)^{-1}X^Ty\$ is the orthogonal projection of \$y\$ onto the space spanned by the columns of \$X\$, which means that \$|pi_X(y)_i| leq |y_i|\$ for every \$i\$, where \$y_i\$ denotes the \$i\$-th entry of vector \$y\$.

This means that \$\$ |y-X_2(X_2^TX_2)^{-1}X_2^Ty|_2^2 leq |y-X(X^TX)^{-1}X^Ty |_2^2 \$\$ Equality happens only when \$x_1(x_1^Tx_1)^{-1}x_1^Tx_2=x_2\$, that is, when the addition of column \$x_2\$ does not increase the rank of \$X\$.

More generally, if \$X_n\$ denotes the matrix resulting from adding polynomial variables up to degree \$n\$, then \$\$ y-X_n(X_n^TX_n)^{-1}X_n^Ty = y-X_{n-1}(X_{n-1}^TX_{n-1})^{-1}X_{n-1}^Ty -x_n(x_n^Tx_n)^{-1}x_n^Ttilde y_{n-1} \$\$ and the same reasoning applies.

If the inverse does not exist, one can replace \$(X^TX)^{-1}X^T\$ the Moore-Penrose pseudoinverse, \$X^+\$.

Rate this post