How can I show that the variance of local polynomial regression is
increasing with the degree of the polynomial (Exercise 6.3 in Elements
of Statistical Learning, second edition)?
This question has been asked before but the answer just states it follows easliy.
More precisely, we consider $y_{i}=f(x_{i})+epsilon_{i}$ with $epsilon_{i}$
being independent with standard deviation $sigma.$
The estimator is given by
$$
hat{f}(x_{0})=left(begin{array}{ccccc}
1 & x_{0} & x_{0}^{2} & dots & x_{0}^{d}end{array}right)left(begin{array}{c}
alpha\
beta_{1}\
vdots\
beta_{d}
end{array}right)
$$
for $alpha,beta_{1},dots,beta_{d}$ solving the following weighted
least squares problem
$$
minleft(y_{d}-underbrace{left(begin{array}{ccccc}
1 & x_{1} & x_{1}^{2} & dots & x_{1}^{d}\
vdots\
1 & & & & x_{n}^{d}
end{array}right)}_{X}left(begin{array}{c}
alpha\
beta_{1}\
vdots\
beta_{d}
end{array}right)right)^{t}Wleft(y-left(begin{array}{ccccc}
1 & x_{1} & x_{1}^{2} & dots & x_{1}^{d}\
vdots\
1 & & & & x_{n}^{d}
end{array}right)left(begin{array}{c}
alpha\
beta_{1}\
vdots\
beta_{d}
end{array}right)right)
$$
for $W=text{diag}left(K(x_{0},x_{i})right)_{i=1dots n}$ with
$K$ being the regression kernel. The solution to the weighted least squares
problem can be written as
$$
left(begin{array}{cccc}
alpha & beta_{1} & dots & beta_{d}end{array}right)=left(X^{t}WXright)^{-1}X^{t}WY.
$$
Thus, for $l(x_{0})=left(begin{array}{ccccc}
1 & x_{0} & x_{0}^{2} & dots & x_{0}^{d}end{array}right)left(X^{t}WXright)^{-1}X^{t}W$ we obtain
$$
hat{f}(x_{0})=l(x_{0})Y
$$
implying that
$$
text{Var }hat{f}(x_{0})=sigma^{2}leftVert l(x_{0})rightVert ^{2}=left(begin{array}{ccccc}
1 & x_{0} & x_{0}^{2} & dots & x_{0}^{d}end{array}right)left(X^{t}WXright)^{-1}X^{t}W^{2}Xleft(X^{t}WXright)^{-1}left(begin{array}{ccccc}
1 & x_{0} & x_{0}^{2} & dots & x_{0}^{d}end{array}right)^{t}.
$$
My approach: An induction using the formula for the inverse of a block
matrix but I did not succeed.
The paper Multivariate Locally Weighted Least Squares Regression
by D. Ruppert and M. P. Wand derives an asymptotic expression for the variance for $nrightarrowinfty$ in Theorem 4.1 but it is not clear that is increasing in the degree.
Best Answer
If the variance increases for every weighting matrix $W$, then this also holds true for $W=I$. Henceforth, I will use the notation of OLS. We have $$y=Xbeta+utextrm{,} qquad textrm{with} qquad Xinmathbb{R}^{ntimes k};, y,uinmathbb{R}^{n};, betainmathbb{R}^{k}$$ and with the standard assumptions. For a polynomial regression, let $begin{pmatrix}x_{1},x_{2},ldots,x_{n}end{pmatrix}^{T}=:xinmathbb{R}^{n}$ be some vector, then we have $$X:=begin{bmatrix}x^{0},x^{1},x^{2},ldots,x^{k-1}end{bmatrix}textrm{,}$$ where exponenation is understood element-wise.
The OLS estimate for the polynomial weights is $$hat{beta} := left(X^{T}Xright)^{-1}X^{T}ytextrm{.}$$ For any $tinmathbb{R}$ we can set $$z:=begin{pmatrix}t^{0}\t^{1}\t^{2}\vdots\t^{k-1}end{pmatrix}inmathbb{R}^{k}textrm{.}$$ An estimate of $y$ at $t$ is then given by $hat{y}_{t}:=z^{T}hat{beta}$. For the variance of $hat{y}_{t}$ w need to know its expected value: begin{align} mathbb{E}left[hat{y}_{t}right] &= mathbb{E}left[z^{T}hat{beta}right] =mathbb{E}left[z^{T}left(X^{T}Xright)^{-1}X^{T}yright] \ &=mathbb{E}left[z^{T}left(X^{T}Xright)^{-1}X^{T}Xbeta + z^{T}left(X^{T}Xright)^{-1}X^{T}uright] \ &=z^{T}beta + z^{T}left(X^{T}Xright)^{-1}X^{T}mathbb{E}left[uright] = z^{T}beta end{align} From this calculation we see that $$hat{y}_{t}-mathbb{E}left[hat{y}_{t}right] = z^{T}left(X^{T}Xright)^{-1}X^{T}utextrm{.}$$ Now we can calculate the variance of $hat{y}_{t}$: begin{align} operatorname{Var}left[hat{y}_{t}right] &= mathbb{E}left[left(hat{y}_{t}-mathbb{E}left[hat{y}_{t}right]right)left(hat{y}_{t}-mathbb{E}left[hat{y}_{t}right]right)^{T}right] \ &= mathbb{E}left[left(z^{T}left(X^{T}Xright)^{-1}X^{T}uright)left(z^{T}left(X^{T}Xright)^{-1}X^{T}uright)^{T}right] \ &= mathbb{E}left[left(z^{T}left(X^{T}Xright)^{-1}X^{T}uright)left(u^{T}Xleft(X^{T}Xright)^{-1}zright)right] \ &= z^{T}left(X^{T}Xright)^{-1}X^{T}mathbb{E}left[uu^{T}right]Xleft(X^{T}Xright)^{-1}z \ &= sigma^{2}z^{T}left(X^{T}Xright)^{-1}z textrm{.} end{align} If we increase $kmapsto k+1$, we will have $$X_{*}:=begin{bmatrix}x^{0},x^{1},x^{2},ldots,x^{k-1},x^{k}end{bmatrix}inmathbb{R}^{ntimesleft(k+1right)}textrm{,}$$ and therefore $hat{beta_{*}}inmathbb{R}^{k+1}$ and $$z_{*}:=begin{pmatrix}t^{0}\t^{1}\t^{2}\vdots\t^{k-1}\t^{k}end{pmatrix}inmathbb{R}^{k+1}textrm{.}$$ The variance of $hat{y}_{t}^{*}$ is now a $left(k+1right)timesleft(k+1right)$ matrix begin{equation} operatorname{Var}left[hat{y}_{t}^{*}right]=sigma^{2}z_{*}^{T}left(X_{*}^{T}X_{*}right)^{-1}z_{*}textrm{,} end{equation} which we need to compare to the $ktimes k$ matrix $operatorname{Var}left[hat{y}_{t}right]$. Since we have inverses, the Schur complement will help: begin{equation} begin{pmatrix}A & B\C & Dend{pmatrix}^{-1} = begin{pmatrix} left(A-B D^{-1} C right)^{-1} & -left(A-B D^{-1} C right)^{-1} B D^{-1} \ -D^{-1}Cleft(A-B D^{-1} C right)^{-1} & D^{-1}+ D^{-1} C left(A-B D^{-1} C right)^{-1} B D^{-1} end{pmatrix} end{equation} Since we have $$X_{*} := begin{bmatrix}X,x^{k}end{bmatrix}$$ and $$z_{*} := begin{pmatrix}z\t^{k}end{pmatrix}$$ we can write with the abbreviation $q:=x^{k}$ begin{equation} operatorname{Var}left[hat{y}_{t}right] = sigma^{2} begin{pmatrix}z^{T},t^{k}end{pmatrix} begin{pmatrix}X^{T}X & X^{T}q \ q^{T}X & q^{T}qend{pmatrix}^{-1} begin{pmatrix}z\t^{k}end{pmatrix} textrm{.} end{equation} We can now invert this block matrix using the aforementioned Schur complement and get begin{align} operatorname{Var}left[hat{y}_{t}right] &= sigma^{2} begin{pmatrix}z^{T},t^{k}end{pmatrix} begin{pmatrix} left(X^{T}X – X^{T}q left(q^{T}qright)^{-1} q^{T}X right)^{-1} & B_{*} \ B_{*}^{T} & D_{*} end{pmatrix} begin{pmatrix}z\t^{k}end{pmatrix} \ &= sigma^{2} left( z^{T}left(X^{T}X – X^{T}q left(q^{T}qright)^{-1} q^{T}X right)^{-1} z + t^{k}z^{T}B_{*} + t^{k}B_{*}^{T}z + t^{2k}D_{*} right) textrm{.} end{align} The matrix $X^{T}q left(q^{T}qright)^{-1} q^{T}X$ is positive semi-definit, because it can be written as $$X^{T}q left(q^{T}qright)^{-1} q^{T}X = left(q^{T}qright)^{-1}X^{T}qq^{T}X$$ and $qq^{T}$ is a rank $1$ matrix with the only non-vanishing eigen value equal to $q^{T}q$. The matrix $q left(q^{T}qright)^{-1} q^{T}$ is the projection on the subspace spanned by $q=x^{k}$, so $X^{T}X succeq X^{T}q left(q^{T}qright)^{-1} q^{T}X$, i.e. the difference $X^{T}X – X^{T}q left(q^{T}qright)^{-1} q^{T}X$ is positive semi-definite. If we invert, the resulting matrix stays positive semi-definit, but it follows that $$X^{T}X succeq X^{T}q left(q^{T}qright)^{-1}X implies left(X^{T}Xright)^{-1} preceq left(X^{T}q left(q^{T}qright)^{-1}Xright)^{-1} $$ So in begin{equation} operatorname{Var}left[hat{y}_{t}^{*}right] = sigma^{2}z^{T}left(X^{T}X – X^{T}q left(q^{T}qright)^{-1} q^{T}X right)^{-1} z + 2sigma^{2}t^{k}z^{T}B_{*} + sigma^{2}t^{2k}D_{*} textrm{.} end{equation} We can calculate each of the terms and conclude that with increasing polynomial degree the variance is non-decreasing.