The ReLU function is commonly used as an activation function in machine learning, as well, as its modifications (ELU, leaky ReLU).
The overall idea of these functions is the same: before x = 0
the value of the function is small (its limit to infinity is zero or -1
), after x = 0
the function grows proportionally to x.
The exponent function (e^x
or e^x-1
) has similar behavior, and its derivative in x = 0
is greater than for sigmoid.
The visualization below illustrates the exponent in comparison with ReLU and sigmoid activation functions.
So, why the simple function y=e^x
is not used as an activation function in neural networks?
Best Answer
I think the most prominent reason is stability. Think about having consequent layers with exponential activation, and what happens to the output when you input a small number to the NN (e.g. $x=1$), the forward calculation will look like: $$o=exp(exp(exp(exp(1))))approx e^{3814279}$$
It can go crazy very quickly and I don't think you can train deep networks with this activation function unless you add other mechanisms like clipping.
Similar Posts:
- Solved – Neural Networks: What activation function should I choose for hidden layers in regression models
- Solved – Neural Networks: What activation function should I choose for hidden layers in regression models
- Solved – Do we still need to use tanh and sigmoid activation functions in neural networks, or can we always replace them by ReLU or leaky ReLU
- Solved – Do we still need to use tanh and sigmoid activation functions in neural networks, or can we always replace them by ReLU or leaky ReLU
- Solved – Approximating leaky ReLU with a differentiable function