I'm struggling with mathematics behind linear regression. In the following lines I pasted the text from the book Pattern Recognition and Machine Learning (p. 46) where author derives the regression function $mathbb{E}_{t} [t | mathbf{x}]$. I want to understand the procedure from the equation (2) to the final result. Could somebody please provide me some useful pointers (and/or links) which concept from the calculus of variations should I study.

The average, expected, loss is given by

$$

mathbb{E}[L] = int int L(t, x (mathbf{x})) p (mathbf{x}, t) , dmathbf{x} , dt.

tag{1}

$$

A common choice of loss function in linear regression is the squared loss given by $L (t, y(mathbf{x})) = { y (mathbf{x}) – t }^{2}$. In this case, the expected loss can be written as

$$

mathbb{E}[L] = int int { y (mathbf{x}) – t }^{2} p (mathbf{x}, t) , dmathbf{x} , dt.

tag{2}

$$

Our goal is to choose $y (mathbf{x})$ so as to minimize $mathbb{E} [L]$. We can do this using the calculus of variations to give

$$

dfrac{delta mathbb{E} [L]}{delta y (mathbf{x})} = 2 int { y (mathbf{x}) – t } p (mathbf{x}, t) , dt = 0.

tag{3}

$$

Solving for $y (mathbf{x})$, and using the sum and product rules of probability, we obtain

$$

y (mathbf{x}) = dfrac{int tp (mathbf{x}, t) , dt}{p (mathbf{x})} = int t p (t | mathbf{x}) , dt = mathbb{E}_{t} [t | mathbf{x}]

tag{4}

$$

**Contents**hide

#### Best Answer

I am assuming your difficulty is in the jump between Eq.2 and Eq.3. All you need is an Euler-Lagrange equation, as in their equation (3). In their notation $f(x,y,dot y)$ would be your $int {y(x)-t}^2p(x,t)dx$, so that $df/ddot{y}=0$, for instance.