I am a new beginner in stats. I have specifically diverted my attention towards this because, I wish to understand the concept of Deep Bayesian Learning, so I am starting with the basics. The question is:

The Bayes rule equation is given by

$P(X | Y)$ = $cfrac{P(Y | X). P(X)}{P(Y)}$

But, I have noticed in some places, the denominator being ignored entirely and using just the numerator of the RHS of the equation.

making it:

$P(X | Y)$ = $P(Y|X).P(X)$

Is there some special case where we can ignore the P(Y)? as in when P(Y) = 1? But, if that's the case, wouldn't all the things become very easy: P(Y | X) will become 1, and P(X | Y) will be just P(X) and done.

**Contents**hide

#### Best Answer

Indeed, ignoring P(Y) is very common. This is not done since P(X) equals one. This is done when we are interested which value of X is more probable, like in classification.

For positive, $P(X=Positive | Y)$ = $cfrac{P(Y | X=Positive)* P(X=Positive)}{P(Y)}$

For negative, $P(X=Negative | Y)$ = $cfrac{P(Y | X=Negative)* P(X=Positive)}{P(Y)}$

If we want to know if $P(X=Positive | Y) > P(X=Negative | Y)$, we can omit ${P(Y)}$ from both expressions. ${P(Y)}$ is a probability and therefore it is a positive constant and can be removed.

Think of a medical example. A person Y goes into a physician office that would like to know if Y suffers from X. ${P(Y)}$ is the probability the Y will step into the office. Y might be very common or rare but it doesn't matter, he is already in the office.

Add to that, that measuring ${P(Y)}$ is no longer a medical question. The physician knowledge cannot help with that. Therefore we prefer avoiding coping with an unnecessary hard problem.

Note that if you would like to be precise, after omitting P(Y) you should write $P(X=Positive | Y) approx P(Y | X=Positive)* P(X=Positive)$ and use approximation instead of equality.

### Similar Posts:

- Solved – Why ignore the denominator of bayes rule?
- Solved – Why ignore the denominator of bayes rule?
- Solved – How to show that the mean of the posterior density minimizes this squared error loss function
- Solved – Problem with classification machine learning: prediction is always wrong
- Solved – Identifying most important words in text classification