Markov Chain Monte Carlo

Visualize the Metropolis-Hastings algorithm sampling from complex distributions.

Markov Chain Monte Carlo (MCMC)

Concept Overview

Markov Chain Monte Carlo (MCMC) is a powerful class of algorithms for sampling from complex probability distributions where direct sampling is impossible or computationally intractable. By constructing a Markov chain whose equilibrium distribution matches the desired target distribution, MCMC methods allow us to generate representative samples by simply simulating the chain for a sufficient number of steps.

Mathematical Definition

The most common MCMC algorithm is the Metropolis-Hastings algorithm. Let P(x) be the target probability distribution we wish to sample from (which need only be known up to a normalizing constant). We propose a transition from the current state x to a new state x' using a proposal distribution Q(x' | x).

A = min(1, [P(x') * Q(x | x')] / [P(x) * Q(x' | x)])

Here, A is the acceptance probability. We generate a uniform random number U ~ Uniform(0, 1). If U < A, we accept the proposed state x' as the next state. Otherwise, we reject it and remain at x. If the proposal distribution is symmetric, meaning Q(x' | x) = Q(x | x'), this simplifies to the original Metropolis algorithm:

A = min(1, P(x') / P(x))

Key Concepts

Detailed Balance

A fundamental property that guarantees the Markov chain converges to the target distribution P(x) is detailed balance (also known as local reversibility). It states that the probability of transitioning from x to x' is exactly balanced by the probability of transitioning back from x' to x:

P(x) * T(x → x') = P(x') * T(x' → x)

where T is the transition probability. The Metropolis-Hastings acceptance rule is explicitly designed to enforce this condition.

Burn-in Phase

The initial state of the Markov chain is often chosen arbitrarily. It takes time for the chain to forget its starting position and reach its stationary distribution. The samples collected during this initial phase, known as the "burn-in" period, are typically discarded because they do not accurately represent the target distribution.

Acceptance Rate

The efficiency of an MCMC sampler heavily depends on the proposal distribution. If proposals are too small, the chain accepts almost all of them but explores the space very slowly (random walk behavior). If proposals are too large, most are rejected, and the chain gets "stuck." For a Gaussian proposal distribution in multiple dimensions, an optimal acceptance rate is often theoretically near 23.4%, though in 1D problems, a rate around 44% is considered ideal.

Historical Context

MCMC originated in the late 1940s at the Los Alamos National Laboratory. Working on the Manhattan Project, physicists Stanislaw Ulam, John von Neumann, and Nicholas Metropolis sought to compute complex integrals for nuclear physics. The original Metropolis algorithm was published in 1953 in the paper "Equation of State Calculations by Fast Computing Machines." In 1970, W. Keith Hastings generalized the algorithm to handle asymmetric proposal distributions, creating the Metropolis-Hastings algorithm used ubiquitously today.

Real-world Applications

Bayesian Inference: Estimating posterior distributions of complex hierarchical models where the denominator (evidence) is impossible to calculate analytically.
Statistical Physics: Simulating systems of interacting particles (like the Ising model) to study phase transitions and thermodynamic properties.
Machine Learning: Training complex probabilistic models, such as Deep Belief Networks or sampling in energy-based models.
Computational Biology: Reconstructing phylogenetic trees from DNA sequences, where the space of all possible trees is exponentially large.

Related Concepts

Monte Carlo Simulation — the broader class of randomized approximation algorithms.
Probability Distributions — the theoretical foundation of target and proposal distributions.
Random Walk — the underlying mechanism by which many basic MCMC proposals explore space.

Markov Chain Monte Carlo