I am trying to understand Naive Bayes. One of the principles of this method is to assume independence across features in the datapoints.

Given this assumption are two distinct features independent given a class C, or are they when marginalizing over C? Or both?

I am a bit confused, haven't studied probability in a while and I am trying to remember how the concepts relate here.

**Contents**hide

#### Best Answer

Naïve Bayes assumes *conditional* independence. Given an observation's class $C$, we assume that the observation's features are independent, i.e. $$X_i perp X_j mid C $$ provided $i neq j$.

The canonical example is spam detection. An email's class $C$ may be either "spam" or "not spam". If we use the bag-of-words model, the features are the word counts: $X_i$ would be the number of occurrences of word $i$ in the email.

Hopefully it's clear that unconditional independence ($X_i perp X_j$ when $i neq j$) is too strong of an assumption; if you observe a high count of one spammy word in an email, you would expect high counts of other spammy words in the same email, so there's strong correlation of features. But class-conditional independence is more reasonable, since a lot of the dependence among features can be explained away by the underlying class.

The assumption of conditional assumption holds once and for all, so I'm not really sure what you're asking with regards to "marginalizing over $C$."

### Similar Posts:

- Solved – In what conditions does naive Bayes classifier perform poorly
- Solved – Understanding the parameters needed for a distribution in Bayes networks
- Solved – Understanding the parameters needed for a distribution in Bayes networks
- Solved – Greater than 1 Naive Bayes Probabilities
- Solved – Calculating mutual Information