I have a simple question – I think.
I have recently read a paper:
That uses a one class naive bayes. My question is – can I do the same as a one class multinomial bayes when I use a Gaussian distribution.
The above paper used a threshold to identify their class of interest in a test dataset.
If I make the following assumptions:
The standard deviation is greater than one for my features in the training data
Add the log sums of the Gaussian pdfs for all variables for each sample
Could I use a threshold, some standard deviation derived from the normal – maybe 3, to identify data points that are close to my one class training data.
Best Answer
According to the paper One-class document classification via Neural Networks of Manevitz and Yousef it seems to be possible to construct a one-class Naive Bayes classifier, even without a standard deviation.
I cite the relevant passage where the authors mention how to implement the core of the classifier:
We calculate $p(d|E)$ as the product of $p(w|E)$ for all words in the dictionary that appear in the document $d$. Each of the $p(w|E)$ is estimated independently using the formula:
$p(w|E) = dfrac{n_w + 1}{n + |dictionary|}$,
where $n_w$ is the number of times word $w$ occurs in $E$, and $n$ is the total number of words in $E$. We calculate a threshold $delta$ by the minimum over all examples in $E$, of the value $p(d|E)$ for each document in the set of examples. Then we experiment with values $lambdacdotdelta$ for $0 < lambda leq 1$ as in the previous algorithms using $F_1$ to find the optimal threshold for acceptance. That is, given a new document $d$, we accept it if the calculated value $p(d|E)$ is larger than the determined $lambdacdotdelta$. For this classifier algorithm we store $delta$ and $lambda$.
A more detailed picture of the algorithm is explained in the doctoral dissertation Characteristic Concept Representations of Piew Datta.