The ReLU function is commonly used as an activation function in machine learning, as well, as its modifications (ELU, leaky ReLU).

The overall idea of these functions is the same: before `x = 0`

the value of the function is small (its limit to infinity is zero or `-1`

), after `x = 0`

the function grows proportionally to x.

The exponent function (`e^x`

or `e^x-1`

) has similar behavior, and its derivative in `x = 0`

is greater than for sigmoid.

The visualization below illustrates the exponent in comparison with ReLU and sigmoid activation functions.

So, why the simple function `y=e^x`

is not used as an activation function in neural networks?

**Contents**hide

#### Best Answer

I think the most prominent reason is stability. Think about having consequent layers with exponential activation, and what happens to the output when you input a small number to the NN (e.g. $x=1$), the forward calculation will look like: $$o=exp(exp(exp(exp(1))))approx e^{3814279}$$

It can go crazy very quickly and I don't think you can train deep networks with this activation function unless you add other mechanisms like clipping.

### Similar Posts:

- Solved – Neural Networks: What activation function should I choose for hidden layers in regression models
- Solved – Neural Networks: What activation function should I choose for hidden layers in regression models
- Solved – Do we still need to use tanh and sigmoid activation functions in neural networks, or can we always replace them by ReLU or leaky ReLU
- Solved – Do we still need to use tanh and sigmoid activation functions in neural networks, or can we always replace them by ReLU or leaky ReLU
- Solved – Approximating leaky ReLU with a differentiable function