# Solved – Most Informative Features with Naive Bayes

Anyone know how to calculate the most informative features where the attributes are normally distributed using Naive Bayes?

My understanding, at least if you have binary attributes, is that you compute max(Pr(feature=1 |classLabel))/min(Pr(feature=1 | classLabel)), for any class label. This will give you the informativeness of feature =1 over two class labels.

But how would you compute the most informative features using a Gaussian Naive Bayes classifier? Any sources would be much appreciated also.

Contents

You could estimate the mutual information of each feature \$i\$ with the class label (also known as the expected information gain),

\$\$I[C, X_i] = H[C] – H[C mid X_i].\$\$

The most informative feature is the one which on average produces the least uncertainty in \$C\$, which is measured by the entropy \$H[C mid X_i]\$. We can estimate the entropy by averaging over data points:

\$\$H[C mid X_i] approx -frac{1}{N} sum_n sum_c p(c mid x_{ni}) log p(c mid x_{ni}).\$\$