There is an answer on the Kaggle question board here by Dr. Fuzzy:
You can assess a total miss-classification szenario by plugging zero-probs in the log-loss function (here sklearn log-loss): LL Count Class 3.31035117 15294 toxic 0.34523409 1595 severe_toxic 1.82876664 8449 obscene 0.10346200 478 threat 1.70495856 7877 insult 0.30410902 1405 identity_hate For some classes the possible LL for total miss-classification is really low. In this range gradients might no longer provide meaningful directions. Another point is that most log-loss implementations use clipping for probs near 0. This will play in here as well.
I understand this person is saying "if you never predict the class 'toxic', then what is the log loss?" And the answer is 3.31035117. My question is: how can you possibly get a non-infinite answer?
As far as I know, sklearn's logloss function is binary crossentropy.
The binary crossentropy function is (for a single label):
-( ylog(p) + (1-y)log(1-p) )
If the label is "toxic" (y=1), but we associate that with probability 0 (p=0), we should get:
-( 1log(0) + (1-1)log(1-0) ) = -( -inf + 0 ) = inf
Why are we not getting infinity here?
Best Answer
As Dr. Fuzzy points out sklearn's log-loss uses clipping. This implies that it isn't putting a 0 in for $p$, but rather some small epsilon value. This is put in to avoid the infinities/weirdness associated with probability 0/1 that you noted.
Similar Posts:
- Solved – Which deep learning model can classify categories which are not mutually exclusive
- Solved – Which deep learning model can classify categories which are not mutually exclusive
- Solved – NNs: Multiple Sigmoid + Binary Cross Entropy giving better results than Softmax + Categorical Cross Entropy
- Solved – Multi-class logarithmic loss function per class
- Solved – One-class SVM vs. OneVsRestClassifier for multi-label text classification task