Contents

# Context and Problem

The simulation can be performed either by a solution of kinetic equations for density functions or by using the stochastic sampling method. The method is an adaptation of the Metropolis–Hastings algorithm

I've read this in some papers as well, but none ever seems to make the connection between the two.

# Metropolis-Hastings PseudoCode for Reference

Here's the Metropolis-Hastings pseudocode to sample from $$pi(x)$$ using a proposal $$p(x'mid x_t)$$. I can't really see how this relates to Simulated Annealing.

1. Choose starting point $$x_0$$
2. Until convergence:
1. Sample a candidate $$x'sim q(x'mid x_t)$$
2. With probability $$A(x', x_t)$$ accept $$x_{t+1} = x'$$, otherwise set $$x_{t+1}=x_t$$ where
$$A(x', x_t) = minleft(frac{pi(x')}{pi(x_t)}frac{q(x_tmid x')}{q(x'mid x_t)}right)$$

How is MH, which is a method for sampling from a distribution, the same as a method used to find the global optimum ofa function?

Metropolis-Hastings moves to a new point based on the ratio of the current point and a new proposed random point $$min(frac{new}{old},1)$$, with some additional stuff for asymmetric distributions). If this ratio is bigger than 1 (the new point has higher likelihood than the current point) then it will move to this new point immediately. Otherwise, if the new point has lower likelihood, the algorithm will move to this point with some probability – based on the ratio. In this case the algorithm will generate a random value between 0 and 1, if the ratio is smaller than this values then it will reject the new point, else it will accept the new point.
Simulated annealing has an additional parameter (temperature) which scales the difference by a certain amount $$exp(frac{new-old}{T})$$. When the temperature is very high the difference won't have any meaningful impact on the decision (which will evaluate to something greater than 1), so the algorithm will always accept any new point, which means it will move at random. When the temperature is very low the criterion will evaluate to ~0 for worse points, so the algorithm will move deterministically, only accepting better solutions.