Solved – Derivative of $(y-XB)’ h(y-XB)$ with respect to $B$

Let $X$ be a $ntimes p$ matrix, $y_{ntimes 1}$ a vector and $B_{ptimes 1}$ coefficients so that $y=XB$. Then what is the derivative of
$$
(y-XB)' h(y-XB)
$$
with respect to B, where $h(.)$ is an $R^nrightarrow R^n$ differentiable function e.g. ($h(z)=z$) and $(')$ is the transpose of a matrix?

Apply the Chain Rule. (It's the only rule you need to know.) To do so, you need to break the overall function into the composition of functions whose derivatives you can find. This is typically done by inspecting its formula and unwinding it from the outside in.

Let all vectors be column vectors and identify $mathbb{R}^ntimes mathbb{R}^n$ with $mathbb{R}^{2n}$ by stacking the two components, $(mathbf{x}, mathbf{y}) = pmatrix{mathbf{x}\mathbf{y}}.$

The last operation is a function

$$u: mathbb{R}^{2n} to mathbb{R};quad u(pmatrix{mathbf{x}\mathbf{y}})=mathbf{x}^prime mathbf{y}.$$

The penultimate operation is

$$v:mathbb{R}^n to mathbb{R}^{2n};quad v(mathbf{x})=pmatrix{mathbf{x}\ h(mathbf{x})}.$$

The first, innermost operation is

$$w:mathbb{R}^p to mathbb{R}^n; quad w(mathbf{b}) = y – Xb.$$

Their composition is the function

$$ucirc v circ w: mathbb{R}^p {,xrightarrow{ w }},mathbb{R}^n,{xrightarrow{ v }},mathbb{R}^{2n},{xrightarrow{ u }},mathbb{R}; quad (ucirc vcirc w)(mathbf{b}) = u(v(w(mathbf{b})))=h(mathbf{b}).$$

The Chain Rule asserts that $h$ is differentiable when each of $u,v,w$ are differentiable and its derivative $Dh:mathbb{R}^p to mathbb{R}$ (which will be written as a $1times p$ matrix) is the composition of the derivatives (each evaluated at the appropriate values),

$$Dh = Du circ Dv circ Dw.$$

You need to find those derivatives. They are

$$(Dw)(mathbf{b}) = -X,$$

$$(Dv)(mathbf{x}) = pmatrix{mathbb{I}_n \ (Dh)_mathbf{x}};quad mathbf{x} = w(mathbf{b});$$

(remember, $Dh$ is an $ntimes n$ matrix), and

$$(Du)(mathbb{x}, mathbb{y}) = pmatrix{mathbf{y},&mathbf{x}};quad mathbf{x}=w(mathbf{b});quad mathbf{y} = h(w(mathbf{b})).$$

To obtain the answer, do the matrix multiplication and plug in the values $mathbb{x} = w(mathbb{b})$ and $mathbb{y} = h(mathbb{x}) = h(w(mathbb{b}))$.


Reference

Michael Spivak, Calculus on Manifolds (1965).

Similar Posts:

Rate this post

Leave a Comment