# Solved – Loss function for linear regression with calculus of variations

I'm struggling with mathematics behind linear regression. In the following lines I pasted the text from the book Pattern Recognition and Machine Learning (p. 46) where author derives the regression function \$mathbb{E}_{t} [t | mathbf{x}]\$. I want to understand the procedure from the equation (2) to the final result. Could somebody please provide me some useful pointers (and/or links) which concept from the calculus of variations should I study.

The average, expected, loss is given by

\$\$
mathbb{E}[L] = int int L(t, x (mathbf{x})) p (mathbf{x}, t) , dmathbf{x} , dt.
tag{1}
\$\$

A common choice of loss function in linear regression is the squared loss given by \$L (t, y(mathbf{x})) = { y (mathbf{x}) – t }^{2}\$. In this case, the expected loss can be written as

\$\$
mathbb{E}[L] = int int { y (mathbf{x}) – t }^{2} p (mathbf{x}, t) , dmathbf{x} , dt.
tag{2}
\$\$

Our goal is to choose \$y (mathbf{x})\$ so as to minimize \$mathbb{E} [L]\$. We can do this using the calculus of variations to give

\$\$
dfrac{delta mathbb{E} [L]}{delta y (mathbf{x})} = 2 int { y (mathbf{x}) – t } p (mathbf{x}, t) , dt = 0.
tag{3}
\$\$

Solving for \$y (mathbf{x})\$, and using the sum and product rules of probability, we obtain

\$\$
y (mathbf{x}) = dfrac{int tp (mathbf{x}, t) , dt}{p (mathbf{x})} = int t p (t | mathbf{x}) , dt = mathbb{E}_{t} [t | mathbf{x}]
tag{4}
\$\$

Contents