Aprenda MCMC Algorithms: Intuition and Steps | Markov Chain Monte Carlo and Approximate Inference

Deslize para mostrar o menu

When you need to generate samples from a probability distribution, but direct sampling is impractical or impossible, Markov Chain Monte Carlo (MCMC) methods offer a powerful solution. The central goal of MCMC is to construct a Markov chain whose stationary distribution matches the target distribution you want to sample from. Over time, samples drawn from this chain approximate samples from the true distribution, even if the distribution is high-dimensional or has a complex shape.

The Metropolis-Hastings algorithm is one of the most widely used MCMC methods. Here’s how it works, step by step:

Start with an initial point, often chosen randomly or based on prior knowledge;
Propose a new sample by perturbing the current point, using a proposal distribution (such as a normal distribution centered on the current point);
Calculate the acceptance ratio, which compares the probability of the proposed point to the current point, adjusted for the proposal distribution;
Accept the proposed point with a probability equal to the acceptance ratio. If not accepted, stay at the current point;
Repeat steps 2–4 for many iterations, building a sequence (chain) of samples.

This iterative process allows you to explore the target distribution, even when it is not possible to sample from it directly. The quality and efficiency of the sampling depend on the choice of proposal distribution and how well the Markov chain explores the space.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
            
import math
import random

random.seed(42)

def target_density(x: float) -> float:
    """Unnormalized target density: mixture of two Gaussians."""
    a = math.exp(-0.5 * ((x + 2) / 1.0) ** 2)   # N(-2, 1)
    b = 0.7 * math.exp(-0.5 * ((x - 2) / 0.6) ** 2)  # 0.7 * N(2, 0.6)
    return a + b

def propose_new_state(current: float, step_scale: float = 1.0) -> float:
    """Random-walk proposal: x' = x + Normal(0, step_scale)."""
    return current + random.gauss(0.0, step_scale)

def metropolis_hastings(
    initial_state: float,
    num_steps: int,
    step_scale: float = 1.0,
    print_every: int = 1
):
    current = initial_state
    samples = []

    # Symmetric proposal -> q cancels out
    for step in range(1, num_steps + 1):
        proposal = propose_new_state(current, step_scale)

        p_current = target_density(current)
        p_proposal = target_density(proposal)

        # For symmetric proposal: acceptance_ratio = p_proposal / p_current
        acceptance_ratio = p_proposal / p_current if p_current > 0 else float("inf")
        acceptance_prob = min(1.0, acceptance_ratio)

        u = random.random()
        accepted = u < acceptance_prob
        if accepted:
            current = proposal

        samples.append(current)

        if step % print_every == 0:
            status = "ACCEPT" if accepted else "REJECT"
            bar_len = 20
            filled = int(round(acceptance_prob * bar_len))
            bar = "█" * filled + "·" * (bar_len - filled)

            print(
                f"step {step:03d} | {status:6} | "
                f"current={current:7.3f} | proposal={proposal:7.3f} | "
                f"p_cur={p_current:8.5f} | p_prop={p_proposal:8.5f} | "
                f"α={acceptance_prob:6.3f} [{bar}] | u={u:6.3f}"
            )

    return samples

samples = metropolis_hastings(
    initial_state=0.0,
    num_steps=30,
    step_scale=1.0,
    print_every=1
)

print("\nFirst 10 samples:", [round(s, 3) for s in samples[:10]])
print("Last  10 samples:", [round(s, 3) for s in samples[-10:]])

When working with MCMC, you need to consider three important concepts: mixing, burn-in, and convergence. Mixing refers to how quickly the Markov chain explores all relevant regions of the target distribution. Good mixing means the chain moves efficiently and does not get stuck in one area. Burn-in is the initial set of samples you discard because the chain has not yet reached the stationary distribution — these early samples may be biased by the starting point. Convergence is the point at which the distribution of your samples closely matches the true target distribution; only after convergence do your samples become reliable for inference.

Despite its power, MCMC comes with some common pitfalls and assumptions. The chain must be able to reach every part of the distribution (ergodicity), and the proposal distribution must be chosen carefully to ensure good mixing. If the chain mixes poorly or you underestimate the burn-in period, your samples may not represent the true distribution. Additionally, MCMC assumes you can evaluate the target distribution up to a normalizing constant, but not always sample from it directly. Always check diagnostics to ensure your MCMC run is valid and your assumptions hold.

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 2. Capítulo 2