Solved – Are all neural network activation functions differentiable

When we design a neural network, we use gradient descent to learn the parameters. Does this require the activation function to be differentiable?

No! For example, ReLU, which is a widely used activation function, is not differentiable in $z=0$. But they are usually non-differentiable at only a small number of points and they have right derivative and left derivatives at these points. We usually use one of the one-side derivatives. This is rational since digital computers are subject to numerical errors ($z=0$ has been probably some small value rounded to zero). Read chapter 6 of the following book for more details on activation functions:

Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016,

Similar Posts:

Rate this post

Leave a Comment