# Solved – How to derive the recursive equation for back propagation for neural networks for \$delta_j = frac{partial E_n}{ partial a_j} \$

I am following the derivation for back propagation presented in Bishop's book Pattern Recognition and Machine Learning and had some confusions in following the derivation presented in section 5.3.1.

In that chapter they present the application of the chain rule for partial derivatives on the definition of \$delta_j\$ and get equation 5.55:

\$\$ delta_j equiv frac{partial E_n}{ partial a_j} = sum_k frac{partial E_n}{ partial a_k} frac{partial a_k}{ partial a_j} \$\$

where the sum runs over all units \$k\$ to which unit \$j\$ sends connections.

My question is in how they get from the equation 5.55 to equation 5.56:

\$\$ delta_j = h'(a_j) sum_k w_{kj} delta_k\$\$

In the chapter of the book they do try to explain how that equation came about with the following paragraph:

If we now substitute the definition of \$delta\$ given by equation (5.51) \$delta equiv frac{partial E_n}{ partial a_j}\$ into equation (5.55) \$ delta_j equiv frac{partial E_n}{ partial a_j} = sum_k frac{partial E_n}{ partial a_k} frac{partial a_k}{ partial a_j} \$ and make use of (5.48) \$a_j = sum_i w_{ji} z_i\$ and (5.49) \$z_j = h(a_j) \$, we obtain the following backpropagation formula (5.56) \$ delta_j = h'(a_j) sum_k w_{kj} delta_k\$

Basically, its not 100% clear how they used all those steps to get \$ delta_j = h'(a_j) sum_k w_{kj} delta_k\$ from \$ delta_j equiv frac{partial E_n}{ partial a_j} = sum_k frac{partial E_n}{ partial a_k} frac{partial a_k}{ partial a_j} \$.

I've tried applying those steps and I will show what I have tried so far:

First I substituted the definition of \$delta\$ to the multivarable chain rule to get from \$ delta_j equiv frac{partial E_n}{ partial a_j} = sum_k frac{partial E_n}{ partial a_k} frac{partial a_k}{ partial a_j} \$ to:

\$\$ delta_j = sum_k delta_k frac{partial a_k}{ partial a_j} \$\$

then I guessed that they some how used the chain rule again on \$ frac{partial a_k}{ partial a_j} \$ and somehow involved \$frac{ partial h(a_j) }{partial a_j} = h'(a_j)\$ to it and then substituted it back. Though that is not 100% clear to me how it was done. Does anyone have an idea how that was done?

As a reference I will paste the relevant section of the book to help:

Contents