Solved – Binary cross-entropy: plugging in probability 0

There is an answer on the Kaggle question board here by Dr. Fuzzy:

You can assess a total miss-classification szenario by plugging zero-probs in the log-loss  function (here sklearn log-loss):  LL           Count    Class 3.31035117   15294    toxic 0.34523409    1595    severe_toxic 1.82876664    8449    obscene 0.10346200     478    threat 1.70495856    7877    insult  0.30410902    1405    identity_hate  For some classes the possible LL for total miss-classification is really low. In this range gradients might no longer provide meaningful directions. Another point is that most log-loss implementations use clipping for probs near 0.  This will play in here as well. 

I understand this person is saying "if you never predict the class 'toxic', then what is the log loss?" And the answer is 3.31035117. My question is: how can you possibly get a non-infinite answer?

As far as I know, sklearn's logloss function is binary crossentropy.

The binary crossentropy function is (for a single label):

-( ylog(p) + (1-y)log(1-p) )

If the label is "toxic" (y=1), but we associate that with probability 0 (p=0), we should get:

-( 1log(0) + (1-1)log(1-0) ) = -( -inf + 0 ) = inf

Why are we not getting infinity here?

As Dr. Fuzzy points out sklearn's log-loss uses clipping. This implies that it isn't putting a 0 in for $p$, but rather some small epsilon value. This is put in to avoid the infinities/weirdness associated with probability 0/1 that you noted.

Similar Posts:

Rate this post

Leave a Comment