# Solved – Derivation of EPE for linear regression in “The elements of statistical learning “

My question is about "The elements of statistical learning" book (yup, the one).
Right now I am kinda stuck on second chapter at part, where they derive EPE for linear regression (Somewhat related to Confusion about derivation of regression function , but I have more a detailed investigation here =)).
Here it is:

Ok, I do not really get where did they get it from, but it is not so important, because I've found notes for the book from authors (A Solution Manual and Notes for: The Elements of Statistical Learning).

There authors try to explain how this formula was derived. But I do not get that either =)

So here is the logic that they have (Just a part, other is not important right now):

I do not really get a lot of things here. Let me first explain how I see variables here:

\$ T(tau)\$ – just all samples, basically iid, and distribution would be likelihood of all samples

\$X\$ – input RV

\$Y\$ – output RV

\$x_0\$ – is selected input (independent of \$T\$, just some random \$x\$)

\$y_0\$ – output calculated based on \$x_0\$ (RV, because it depends on e – normal error)

1. What is \$E_{{y_0}|{x_0}}\$, how does it differ from \$E_{Y|X}\$?
I mean \$X\$ an is input and \$Y\$ is an output, which have quite straightforward relation \$Y = beta X + e\$. So \$E_{Y|X}\$ is expectation of output based on input. And \$E_{{y_0}|{x_0}}\$ seems pretty much the same to me, though I kinda understand, that \$y_0\$ is some output given that we have chosen \$x_0\$ (as it seems it is constant here, though I guess it is weird condition on constant). But, maybe, it is more broader thing, don't know.

2. How this is true: \$E_T =E_XE_{Y|X} \$

3. And the one that blew my head off: \$E_TU_1 = U_1E_T\$.
What does it mean? =) I understand expectation for Random Variable, but what is \$U_1E_T\$, how am I supposed to understand it? Seems like \$E_TU_1\$ is just constant given that we've chosen \$y_0\$ and \$x_0\$, but I can be totally wrong here.

I have many more questions, but for now I will stop here. I have ok background in stats and linear algebra, I understand undergrad level books (without measure theory though). But I really can't grasp how authors use expectations.

Contents

#### Best Answer

I am trying to answer the first question Suppose \$x_0\$ and \$y_0\$ are both in \$R^p\$. \$T\$ is the space of training parameters which are sets of pairs \$(x_0,y_0)\$ such that \$y_0=x_0^Tbeta + epsilon\$. These sets define exactly the training data and starting from there an linear estimation of \$y\$ as a function of \$x\$ is calculated with the linear regression method. Let \$g:R^p mapsto R^p\$ be a function. Then \$E_{x_0|y_0}g = int_A!g(y)rho(y|x_0),mathrm{d}y\$ where \$A= {y=x_0^Tbeta + epsilon, where epsilon in N(0,sigma) }\$ . Here \$rho\$ is the conditional probability density function of \$y\$ when \$x=x_0\$. In our case \$g(y)=E_T(y – hat{y})^2\$

For the second question \$X\$ and \$Y\$ denote ordered sets of \$R^p\$ (the input \$x\$), respectively the sets of corresponding \$y\$.

For the third question \$E_T U_1 = U_1 E_T\$ when \$U_1\$ is a constant with respect to the integration variable in the estimation integral.

Rate this post