I'm still trying to understand dropout completely, but this is what think is happening so far:
At each step there is a chance p of a unit being set to zero.
If a rectified linear unit (ReLU) is used for activation, then a weight of zero can often result in 'dead' units.
If I run a network for a long time (i.e. towards infinity), will all my units become stuck at zero?
I think the answer is no, but I'm not sure why or what process is involved.
Best Answer
I think you're conflating the dead ReLU problem and dropout. ReLU nets often use dropout, but they are not the same.
Dropout
Using dropout "freezes" some units (at random and usually but not always independently) by ignoring their weights at each iteration. The frozen units are not set to zero, but for that iteration, the network pretends that they are zero. The frozen units are not updated for this iteration.
For the next iteration, all units, including the ones that were frozen during the last iteration, are available to be frozen (again, at random). It's possible for some units to be frozen several times in a row (albeit with diminishing probability).
The non-frozen units are evaluated and updated as usual.
You can use dropout with any type of neuron.
Dead ReLU
So-called "dead neurons" or "dead ReLUs" happen when the weights update in such a way that the unit always returns 0 for any of the inputs in the data set; for example, if the inputs are between 0 and 1, but the weight is negative, then the ReLU always returns 0. Once this happens, the back-propagation can't update the unit: because it's output is always 0, so is its gradient, and the weight never changes. ELUs and other variations on ReLU can ameliorate this.
Similar Posts:
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – ReLUs and Gradient Descent for Deep Neural Nets