I keep seeing this term "mixing": when people want to show their sampler works better, they say it "mixes" better. The term is a little counter-intuitive.
Best Answer
When people say "mixing" in the context of Markov chain Monte Carlo (MCMC), they are (knowingly or unknowingly) referring to the "mixing time" of the Markov chain.
Intuitively, mixing time for a Markov chain is the number of steps required of the Markov chain to come close to the stationary distribution (or in the world of Bayesian statistics, posterior distribution). If $pi$ is the stationary distribution and $P(x,A)$ is the Markov chain transition kernel, where $x$ is the starting value of the Markov chain, and $A$ is a measurable set, then the mixing time is the first time $t$ such that
$$left|P^t(x,A) – pi(A)right|_{TV} leq dfrac{1}{4}. $$
Here $|cdot|_{TV}$ refers to total variation distance. This is only one of the many definitions, but they all intuitively mean the same.
The mixing time has a direct impact on sampling quality since, the smaller the mixing time, the faster the convergence of the Markov chain to the stationary distribution, and the smaller the correlation in the samples.
Similar Posts:
- Solved – Time Reversible Markov Chain and Ergodic Markov Chain
- Solved – Sampling from the joint distribution p(x,y) when y = f(x)
- Solved – Markov chains with a stationary distribution but no limiting distribution
- Solved – MCMC methods – burning samples
- Solved – When is Markov chain a generator for iid sequences