In Simple Neural Network back propagation, we normally use one round of forward and back propagation in every iteration. Let's assume, we have one training example for any arbitrary dimensions, and some initial weights. Then using forward propagation, we calculate the predicted output. This predicted output is then used to calculate the total error which is the back propagated to Re-calculate the weights. After recalculating the weights for all the layers, we update the weights for all the layers all at once. It's not like first we update the weights of one layer and then the other, but instead we first recalculate the weights of all layers( layer by layer ) and then update all at once. We can conclude that "
Re-calculating of the weights layer by layer and then updating the
weights with recalculated weights all at one for all the layers".
Does this makes sense? Is it the right way of weight update using back propagation?
Now Let's assume, I have "m" examples instead of just one example. In case of "m" examples, each of these small gradient steps will be taken after one back propagation iteration over all examples "m".
I am confused that in case of "m" examples, this back propagation
works on these examples one by one. Like, it first takes the first
example and update the weights. Then it takes the second example and
calculate the weight again. then it takes the third example and
calculate the weight and so on. Then in the last when it has run over
all the examples, only then it takes the single step towards optimum
point. If that is the case, is there any relation between weights for
one example to the weights for another example?? As the BP is is
recalculating the weights for each examples in sequence?
A batch of data is taken for feed-forward and "Back-propagation" is performed on the number of examples in that batch. Wights and bias are updated on the basis of change of average error/batch. Then change in weights are updated in the previous wights before performing feed-forward on the next batch of data. A detailed explanation is given in the following book:
- Solved – the difference among stochastic, batch and mini-batch learning styles
- Solved – Can a neural network with random connections still work correctly
- Solved – Are all weights updated during back propagation for a multi layered perception
- Solved – Back propagation in Convolutional neural networks
- Solved – Does a Neural Network actually need an activation function or is that just for Back Propagation