Metropolis Sampling
Learn about Metropolis and Metropolis-Hastings MCMC sampling methods.
Overview
The Metropolis and Metropolis-Hastings algorithms are foundational Markov Chain Monte Carlo (MCMC) methods. MCMC methods are powerful computational techniques for sampling from complex probability distributions.
Key Concepts
- Target Distribution: The distribution we want to sample from
- Proposal Distribution: A distribution used to suggest new states
- Markov Chain: A sequence of random variables where each depends only on the previous one
- Acceptance Ratio: Determines whether to accept or reject proposed moves
The Metropolis Algorithm
The Metropolis sampling algorithm is a method for generating samples from a probability distribution that might be hard to sample from directly. Here's the intuition: imagine you're exploring a hilly landscape in fog, and you want to spend more time at higher elevations (higher probability regions).
At each step, you:
Propose a random move from where you currently stand (e.g., step in a random direction). If the new spot is higher (higher probability), you always accept the move. If the new spot is lower, you accept the move sometimes — with a probability proportional to how much lower it is. A slightly lower spot is often accepted; a much lower spot is rarely accepted.
By repeating this over many steps, the places you visit will naturally form a collection of samples that match the target distribution — you'll spend more time in high-probability regions and less time in low-probability ones. The key insight is that you never need to know the full, normalized probability distribution. You only need to compare the relative probabilities of two points, which makes it incredibly useful for complex distributions in physics, statistics, and machine learning.
The Metropolis algorithm uses a symmetric proposal distribution to generate samples from a target distribution.
You need to know a function that is proportional to the probability distribution, but not necessarily the full, normalized form. This is a crucial distinction. In practice, this means you need to be able to evaluate some function f(x) where:
p(x)=f(x)/Z
You need to know f(x), but you don't need to know the normalizing constant Z (which is often an intractable integral). This is because the algorithm only ever uses ratios of probabilities and Z cancels out completely.
This is exactly what makes Metropolis sampling so powerful. In many real-world problems — like Bayesian inference or statistical mechanics — computing the normalizing constant is the hard part (it might involve integrating over millions of dimensions). Metropolis lets you sidestep that entirely.
Algorithm Steps
- Start with an initial state
- For each iteration:
- Propose a new state from the proposal distribution
- Calculate the acceptance ratio
- Accept or reject the proposed state based on this ratio
The Metropolis-Hastings Algorithm
Metropolis-Hastings extends Metropolis by allowing asymmetric proposal distributions, providing greater flexibility for complex sampling problems.
Applications
- Bayesian inference
- Parameter estimation
- Statistical physics
- Machine learning
Further Reading
- Hamiltonian Monte Carlo (HMC)
- Gibbs sampling
- PyMC3, Stan, or JAGS for practical implementations