I understand that likelihood differs from a probability distribution because likelihood describes the probability of certain parameter values given the data that you've observed (it's essentially a distribution that describes observed data) while a probability distribution describes the probability of observing certain values given constant parameter values. But what is a marginal likelihood and how does it relate to posterior distributions? (preferably explained without, or with as little as possible, probability notation so that the explanation is more intuitive). Any examples would be great as well.
Best Answer
In Bayesian statistics, the marginal likelihood $$m(x) = int_Theta f(x|theta)pi(theta),text dtheta$$ where
- $x$ is the sample
- $f(x|theta)$ is the sampling density, which is proportional to the model likelihood
- $pi(theta)$ is the prior density
is a misnomer in that
- it is not a likelihood function [as a function of the parameter], since the parameter is integrated out (i.e., the likelihood function is averaged against the prior measure),
- it is a density in the observations, the predictive density of the sample,
- it is not defined up to a multiplicative constant,
- it does not solely depend on sufficient statistics
Other names for $m(x)$ are evidence, prior predictive, partition function. It has however several important roles:
- this is the normalising constant of the posterior distribution$$pi(theta|x) = dfrac{f(x|theta)pi(theta)}{m(x)}$$
- in model comparison, this is the contribution of the data to the posterior probability of the associated model and the numerator or denominator in the Bayes factor.
- it is a measure of goodness-of-fit (of a model to the data $x$), in that $2log m(x)$ is asymptotically the BIC (Bayesian information criterion) of Schwarz (1978).
See also
Normalizing constant in Bayes theorem
Similar Posts:
- Solved – Dropping the normalization constant in Bayesian inference
- Solved – Normalizing constant irrelevant in Bayes theorem?
- Solved – Normalizing constant irrelevant in Bayes theorem?
- Solved – Understanding the Beta conjugate prior in Bayesian inference about a frequency
- Solved – Understanding the Beta conjugate prior in Bayesian inference about a frequency