Solved – Naive Bayes with Laplace Smoothing Probabilities Not Adding Up

Let c refer to a class (such as Positive or Negative), and let w refer to a token or word.

Define

$count(w,c) = $ $counts w in class c$

$count(c) = counts of words in class c$

P(w|c)= $( count(w,c)+1 ) div ( count(c)+|V|+1)$,

$|V|$ refers to the vocabulary (the words in the training set).

In particular, any unknown word will have probability
$ 1 div count(c)+|V|+1 $

So my problem is let's say I have the following setup

Training Set

1 : a, d, o —> +

2 : a, g, w —> +

3 : d, r, w —> –

So using this

$|V| = 6$

But if I try to do this, the probabilities for the negative class dont add to 1.

$P(a|-) = (0+1) div (3+6+1) = 0.1$

$P(d|-) = (1+1) div (3+6+1) = 0.2$

$P(o|-) = (0+1) div (3+6+1) = 0.1$

$P(g|-) = (0+1) div (3+6+1) = 0.1$

$P(w|-) = (1+1) div (3+6+1) = 0.2$

$P(r|-) = (1+1) div (3+6+1) = 0.2$

Am I doing something wrong here?

Best Answer

The correct equation for $P(w|c)$ should instead be

$P(w|c)= frac{count(w,c)+1}{count(c)+|V|}$

assuming that there are $V$ words in class $c$. If you make this correction, all your probabilities add to $1$, as desired.

Similar Posts:

Rate this post

Leave a Comment

Solved – Naive Bayes with Laplace Smoothing Probabilities Not Adding Up

Let c refer to a class (such as Positive or Negative), and let w refer to a token or word.

Define

$count(w,c) = $ $counts w in class c$

$count(c) = counts of words in class c$

P(w|c)= $( count(w,c)+1 ) div ( count(c)+|V|+1)$,

$|V|$ refers to the vocabulary (the words in the training set).

In particular, any unknown word will have probability
$ 1 div count(c)+|V|+1 $

So my problem is let's say I have the following setup

Training Set

1 : a, d, o —> +

2 : a, g, w —> +

3 : d, r, w —> –

So using this

$|V| = 6$

But if I try to do this, the probabilities for the negative class dont add to 1.

$P(a|-) = (0+1) div (3+6+1) = 0.1$

$P(d|-) = (1+1) div (3+6+1) = 0.2$

$P(o|-) = (0+1) div (3+6+1) = 0.1$

$P(g|-) = (0+1) div (3+6+1) = 0.1$

$P(w|-) = (1+1) div (3+6+1) = 0.2$

$P(r|-) = (1+1) div (3+6+1) = 0.2$

Am I doing something wrong here?

Best Answer

The correct equation for $P(w|c)$ should instead be

$P(w|c)= frac{count(w,c)+1}{count(c)+|V|}$

assuming that there are $V$ words in class $c$. If you make this correction, all your probabilities add to $1$, as desired.

Similar Posts:

Rate this post

Leave a Comment