I am trying to train a fully convolutional neural network for 3D medical image segmentation, I have started from the architecture of this paper with the differences being that I have images of varying sizes so I train the network one image at a time (no batching) and I use relus instead of prelus as the non-linearities.

The problem I am having is that the outputs of the model before the softmax/sigmoid are too large (around 1e32 each logit) and when calculating the cross entropy loss the calculation blows up and returns infinity or nan.

At first I thought this might be due to exploding gradients so I tried gradient clipping and the problem remained. After this I just took the outputs and divided them by a large number (1e32) and I started to get real values for the loss function.

My question is, what it the correct (certainly more elegant way) of achieving reasonable values for the logits , perhaps some sort of local normalisation at the end of each convolution layer?

**Contents**hide

#### Best Answer

Try either removing some layers or reducing the learning rate. If explosion happens before calculating the first or second loss, reducing the LR won't help.

I had the same problem and now I'm stuck with LR=0.001. Tell me if you found something better, so I can try it too.

### Similar Posts:

- Solved – Classifying sequences of handwritten digits
- Solved – Classifying sequences of handwritten digits
- Solved – Classifying sequences of handwritten digits
- Solved – My loss is either 0.0 or randomly very high – Tensorflow
- Solved – Is a forward-pass on a neural network a similar operation to a predict operation