I am using a tanh as my activation function for my NN. I also was using the cross entropy cost function previously when I had sigmoid neurons. The sigmoid neurons can never make it to zero but a tanh can and when I train the NN I will get division by zero errors. I switched back to the quadratic cost function but it converges slowly. Is there a way to use the cross entropy cost with a tanh or is there something better I could use?

**Contents**hide

#### Best Answer

It's common to use softmax as a final layer. It helps you to convert the output values to the probabilities. If you use softmax as an activation function for the final layer you can use any function you like for the previous layers.

### Similar Posts:

- Solved – Neuron saturation occurs only in last layer or all layers
- Solved – Softmax in multi-class in deep NNs
- Solved – The tanh activation function in backpropagation
- Solved – Reverse derivation of negative log likelihood cost function
- Solved – Can ReLU replace a Sigmoid Activation Function in Neural Network without needing to change other parameters/functions of Network?