Solved – Why are polynomial activation functions not used

Why are polynomial functions bad as activations?

There has been some work which experiments with quadratic activations — see "neural tensor networks" but in general a disadvantage of second order and higher polynomials is that they don't have a bounded derivative, which could lead to exploding gradients.

Similar Posts:

Rate this post

Leave a Comment