My question is about "The elements of statistical learning" book (yup, the one).

Right now I am kinda stuck on second chapter at part, where they derive EPE for linear regression (Somewhat related to Confusion about derivation of regression function , but I have more a detailed investigation here =)).

Here it is:

Ok, I do not really get where did they get it from, but it is not so important, because I've found notes for the book from authors (A Solution Manual and Notes for: The Elements of Statistical Learning).

There authors try to explain how this formula was derived. But I do not get that either =)

So here is the logic that they have (Just a part, other is not important right now):

I do not really get a lot of things here. Let me first explain how I see variables here:

$ T(tau)$ – just all samples, basically iid, and distribution would be likelihood of all samples

$X$ – input RV

$Y$ – output RV

$x_0$ – is selected input (independent of $T$, just some random $x$)

$y_0$ – output calculated based on $x_0$ (RV, because it depends on e – normal error)

What is $E_{{y_0}|{x_0}}$, how does it differ from $E_{Y|X}$?

I mean $X$ an is input and $Y$ is an output, which have quite straightforward relation $Y = beta X + e$. So $E_{Y|X}$ is expectation of output based on input. And $E_{{y_0}|{x_0}}$ seems pretty much the same to me, though I kinda understand, that $y_0$ is some output given that we have chosen $x_0$ (as it seems it is constant here, though I guess it is weird condition on constant). But, maybe, it is more broader thing, don't know.How this is true: $E_T =E_XE_{Y|X} $

- And the one that blew my head off: $E_TU_1 = U_1E_T$.

What does it mean? =) I understand expectation for Random Variable, but what is $U_1E_T$, how am I supposed to understand it? Seems like $E_TU_1$ is just constant given that we've chosen $y_0$ and $x_0$, but I can be totally wrong here.

I have many more questions, but for now I will stop here. I have ok background in stats and linear algebra, I understand undergrad level books (without measure theory though). But I really can't grasp how authors use expectations.

**Contents**hide

#### Best Answer

I am trying to answer the first question Suppose $x_0$ and $y_0$ are both in $R^p$. $T$ is the space of training parameters which are sets of pairs $(x_0,y_0)$ such that $y_0=x_0^Tbeta + epsilon$. These sets define exactly the training data and starting from there an linear estimation of $y$ as a function of $x$ is calculated with the linear regression method. Let $g:R^p mapsto R^p$ be a function. Then $E_{x_0|y_0}g = int_A!g(y)rho(y|x_0),mathrm{d}y$ where $A= {y=x_0^Tbeta + epsilon, where epsilon in N(0,sigma) }$ . Here $rho$ is the conditional probability density function of $y$ when $x=x_0$. In our case $g(y)=E_T(y – hat{y})^2$

For the second question $X$ and $Y$ denote ordered sets of $R^p$ (the input $x$), respectively the sets of corresponding $y$.

For the third question $E_T U_1 = U_1 E_T$ when $U_1$ is a constant with respect to the integration variable in the estimation integral.

### Similar Posts:

- Solved – Derivation of EPE for linear regression in “The elements of statistical learning “
- Solved – Are bias weights essential in the output layer, if one wants a universal function approximator (or non-linearly separable problem solver)
- Solved – Feedforward Neural networks for Regression confusion
- Solved – Training a multi-layer neural network to give a constant output
- Solved – Insight of neural network as extension of logistic regression