# Solved – Derivation of EPE for linear regression in “The elements of statistical learning “

My question is about "The elements of statistical learning" book (yup, the one).
Right now I am kinda stuck on second chapter at part, where they derive EPE for linear regression (Somewhat related to Confusion about derivation of regression function , but I have more a detailed investigation here =)).
Here it is:

Ok, I do not really get where did they get it from, but it is not so important, because I've found notes for the book from authors (A Solution Manual and Notes for: The Elements of Statistical Learning).

There authors try to explain how this formula was derived. But I do not get that either =)

So here is the logic that they have (Just a part, other is not important right now):

I do not really get a lot of things here. Let me first explain how I see variables here:

\$ T(tau)\$ – just all samples, basically iid, and distribution would be likelihood of all samples

\$X\$ – input RV

\$Y\$ – output RV

\$x_0\$ – is selected input (independent of \$T\$, just some random \$x\$)

\$y_0\$ – output calculated based on \$x_0\$ (RV, because it depends on e – normal error)

1. What is \$E_{{y_0}|{x_0}}\$, how does it differ from \$E_{Y|X}\$?
I mean \$X\$ an is input and \$Y\$ is an output, which have quite straightforward relation \$Y = beta X + e\$. So \$E_{Y|X}\$ is expectation of output based on input. And \$E_{{y_0}|{x_0}}\$ seems pretty much the same to me, though I kinda understand, that \$y_0\$ is some output given that we have chosen \$x_0\$ (as it seems it is constant here, though I guess it is weird condition on constant). But, maybe, it is more broader thing, don't know.

2. How this is true: \$E_T =E_XE_{Y|X} \$

3. And the one that blew my head off: \$E_TU_1 = U_1E_T\$.
What does it mean? =) I understand expectation for Random Variable, but what is \$U_1E_T\$, how am I supposed to understand it? Seems like \$E_TU_1\$ is just constant given that we've chosen \$y_0\$ and \$x_0\$, but I can be totally wrong here.

I have many more questions, but for now I will stop here. I have ok background in stats and linear algebra, I understand undergrad level books (without measure theory though). But I really can't grasp how authors use expectations.

Contents