Solved – Understanding MCMC: what would the alternative be

Learning Bayesian stats for the first time; as an angle towards understanding MCMC I wondered: is it doing something that fundamentally can't be done another way, or is it just doing something far more efficiently than the alternatives?

By way of illustration, suppose we're trying to compute the probability of our parameters given the data $P(x,y,z|D)$ given a model that computes the opposite, $P(D|x,y,z)$. To calculate this directly with Bayes' theorem we need the denominator $P(D)$ as pointed out here. But could we compute that by integration, say as follows:

p_d = 0. for x in range(xmin,xmax,dx):     for y in range(ymin,ymax,dy):         for z in range(zmin,zmax,dz):             p_d_given_x_y_z = cdf(model(x,y,z),d)             p_d += p_d_given_x_y_z * dx * dy * dz 

Would that work (albeit very inefficiently with higher numbers of variables) or is there something else that would cause this approach to fail?

You are describing a grid approximation to the posterior, and that is a valid approach, allthough not the most popular. There are quite a few cases in which the posterior distribution can be computed analytically. Monte Carlo Markov Chains, or other approximate methods, are methods to obtain samples of the posterior distribution, that sometimes work when the analytical solution cannot be found.

The analytical solutions that can be found are typically cases of "conjugate" families, and you can find more about that by googling, see for example https://en.wikipedia.org/wiki/Conjugate_prior.

As a first example, if your prior on p is uniform on [0, 1], where p is a success parameter in a simple binomial experiment, the posterior is equal to a Beta distribution. Integration, or summation, can be done explicitly in this case.

If you have finitely many parameter choices, or you use a grid approximation as in your example, a simple summation may be all you need. The number of computations can explode quickly however, if you have a couple of variables and want to use a dense grid.

There are several algorithms for sampling from the posterior. Hamiltonian Monte Carlo, specifically the NUTS sampler, is now popular and used in stan and PyMC3, Metropolis Hastings is the classic. Variational Inference is a relative newcomer, not a sampling method actually but a different way of obtaining an approximation. At the moment, none of the methods, including analytical solutions, are the best, they all work well in specific cases.

Similar Posts:

Rate this post

Leave a Comment