Solved – the relationship between Metropolis Hastings and Simulated Annealing

Context and Problem

In the Wikipedia page for Simulated Annealing they state

The simulation can be performed either by a solution of kinetic equations for density functions[2][3] or by using the stochastic sampling method.[1][4] The method is an adaptation of the Metropolis–Hastings algorithm

I've read this in some papers as well, but none ever seems to make the connection between the two.

Metropolis-Hastings PseudoCode for Reference

Here's the Metropolis-Hastings pseudocode to sample from $pi(x)$ using a proposal $p(x'mid x_t)$. I can't really see how this relates to Simulated Annealing.

  1. Choose starting point $x_0$
  2. Until convergence:
    1. Sample a candidate $x'sim q(x'mid x_t)$
    2. With probability $A(x', x_t)$ accept $x_{t+1} = x'$, otherwise set $x_{t+1}=x_t$ where
      A(x', x_t) = minleft(frac{pi(x')}{pi(x_t)}frac{q(x_tmid x')}{q(x'mid x_t)}right)

How is MH, which is a method for sampling from a distribution, the same as a method used to find the global optimum ofa function?

Best Answer

Simulated annealing is a meta-heuristic algorithm used for optimization, that is finding the minimum/maximum of a function. Metropolis-Hastings is an algorithm used for exploring a function (finding possible values/samples).

Both algorithms are stochastic, generating new points to move to at random. Where they differ is in their acceptance/rejection criterion. Both algorithms move to a new random point with a certain probability, which is based on the difference (or the ratio) of the current and new proposed point in the search space.

Metropolis-Hastings moves to a new point based on the ratio of the current point and a new proposed random point $min(frac{new}{old},1)$, with some additional stuff for asymmetric distributions). If this ratio is bigger than 1 (the new point has higher likelihood than the current point) then it will move to this new point immediately. Otherwise, if the new point has lower likelihood, the algorithm will move to this point with some probability – based on the ratio. In this case the algorithm will generate a random value between 0 and 1, if the ratio is smaller than this values then it will reject the new point, else it will accept the new point.

Simulated annealing has an additional parameter (temperature) which scales the difference by a certain amount $exp(frac{new-old}{T})$. When the temperature is very high the difference won't have any meaningful impact on the decision (which will evaluate to something greater than 1), so the algorithm will always accept any new point, which means it will move at random. When the temperature is very low the criterion will evaluate to ~0 for worse points, so the algorithm will move deterministically, only accepting better solutions.

Similar Posts:

Rate this post

Leave a Comment