I'm still trying to understand dropout completely, but this is what think is happening so far:

At each step there is a chance

*p*of a unit being set to zero.If a rectified linear unit (ReLU) is used for activation, then a weight of zero can often result in 'dead' units.

If I run a network for a long time (i.e. towards infinity), will all my units become stuck at zero?

I think the answer is no, but I'm not sure why or what process is involved.

**Contents**hide

#### Best Answer

I think you're conflating the dead ReLU problem and dropout. ReLU nets often use dropout, but they are not the same.

**Dropout**

Using dropout "freezes" some units (at random and usually but not always independently) by ignoring their weights at each iteration. The frozen units are not *set* to zero, but for that iteration, the network *pretends* that they are zero. The frozen units are not updated for this iteration.

For the next iteration, **all** units, including the ones that were frozen during the last iteration, are available to be frozen (again, at random). It's possible for some units to be frozen several times in a row (albeit with diminishing probability).

The non-frozen units are evaluated and updated as usual.

You can use dropout with any type of neuron.

**Dead ReLU**

So-called "dead neurons" or "dead ReLUs" happen when the weights update in such a way that the unit always returns 0 for any of the inputs in the data set; for example, if the inputs are between 0 and 1, but the weight is negative, then the ReLU always returns 0. Once this happens, the back-propagation can't update the unit: because it's output is always 0, so is its gradient, and the weight never changes. ELUs and other variations on ReLU can ameliorate this.

### Similar Posts:

- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – Does it make sense to use a dropout layer in a neural network for a regression to predict an absolute Error
- Solved – ReLUs and Gradient Descent for Deep Neural Nets