Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda MCMC Algorithms: Intuition and Steps | Markov Chain Monte Carlo and Approximate Inference
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Sampling Methods for Machine Learning

bookMCMC Algorithms: Intuition and Steps

When you need to generate samples from a probability distribution, but direct sampling is impractical or impossible, Markov Chain Monte Carlo (MCMC) methods offer a powerful solution. The central goal of MCMC is to construct a Markov chain whose stationary distribution matches the target distribution you want to sample from. Over time, samples drawn from this chain approximate samples from the true distribution, even if the distribution is high-dimensional or has a complex shape.

The Metropolis-Hastings algorithm is one of the most widely used MCMC methods. Here’s how it works, step by step:

  1. Start with an initial point, often chosen randomly or based on prior knowledge;
  2. Propose a new sample by perturbing the current point, using a proposal distribution (such as a normal distribution centered on the current point);
  3. Calculate the acceptance ratio, which compares the probability of the proposed point to the current point, adjusted for the proposal distribution;
  4. Accept the proposed point with a probability equal to the acceptance ratio. If not accepted, stay at the current point;
  5. Repeat steps 2–4 for many iterations, building a sequence (chain) of samples.

This iterative process allows you to explore the target distribution, even when it is not possible to sample from it directly. The quality and efficiency of the sampling depend on the choice of proposal distribution and how well the Markov chain explores the space.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import math import random random.seed(42) def target_density(x: float) -> float: """Unnormalized target density: mixture of two Gaussians.""" a = math.exp(-0.5 * ((x + 2) / 1.0) ** 2) # N(-2, 1) b = 0.7 * math.exp(-0.5 * ((x - 2) / 0.6) ** 2) # 0.7 * N(2, 0.6) return a + b def propose_new_state(current: float, step_scale: float = 1.0) -> float: """Random-walk proposal: x' = x + Normal(0, step_scale).""" return current + random.gauss(0.0, step_scale) def metropolis_hastings( initial_state: float, num_steps: int, step_scale: float = 1.0, print_every: int = 1 ): current = initial_state samples = [] # Symmetric proposal -> q cancels out for step in range(1, num_steps + 1): proposal = propose_new_state(current, step_scale) p_current = target_density(current) p_proposal = target_density(proposal) # For symmetric proposal: acceptance_ratio = p_proposal / p_current acceptance_ratio = p_proposal / p_current if p_current > 0 else float("inf") acceptance_prob = min(1.0, acceptance_ratio) u = random.random() accepted = u < acceptance_prob if accepted: current = proposal samples.append(current) if step % print_every == 0: status = "ACCEPT" if accepted else "REJECT" bar_len = 20 filled = int(round(acceptance_prob * bar_len)) bar = "█" * filled + "·" * (bar_len - filled) print( f"step {step:03d} | {status:6} | " f"current={current:7.3f} | proposal={proposal:7.3f} | " f"p_cur={p_current:8.5f} | p_prop={p_proposal:8.5f} | " f"α={acceptance_prob:6.3f} [{bar}] | u={u:6.3f}" ) return samples samples = metropolis_hastings( initial_state=0.0, num_steps=30, step_scale=1.0, print_every=1 ) print("\nFirst 10 samples:", [round(s, 3) for s in samples[:10]]) print("Last 10 samples:", [round(s, 3) for s in samples[-10:]])
copy

When working with MCMC, you need to consider three important concepts: mixing, burn-in, and convergence. Mixing refers to how quickly the Markov chain explores all relevant regions of the target distribution. Good mixing means the chain moves efficiently and does not get stuck in one area. Burn-in is the initial set of samples you discard because the chain has not yet reached the stationary distribution — these early samples may be biased by the starting point. Convergence is the point at which the distribution of your samples closely matches the true target distribution; only after convergence do your samples become reliable for inference.

Despite its power, MCMC comes with some common pitfalls and assumptions. The chain must be able to reach every part of the distribution (ergodicity), and the proposal distribution must be chosen carefully to ensure good mixing. If the chain mixes poorly or you underestimate the burn-in period, your samples may not represent the true distribution. Additionally, MCMC assumes you can evaluate the target distribution up to a normalizing constant, but not always sample from it directly. Always check diagnostics to ensure your MCMC run is valid and your assumptions hold.

question mark

Which statement best describes the concept of burn-in in Markov Chain Monte Carlo (MCMC) methods?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain how the acceptance ratio is calculated in the Metropolis-Hastings algorithm?

What does the output of the provided code tell us about the sampling process?

How do I choose a good proposal distribution or step size for Metropolis-Hastings?

bookMCMC Algorithms: Intuition and Steps

Deslize para mostrar o menu

When you need to generate samples from a probability distribution, but direct sampling is impractical or impossible, Markov Chain Monte Carlo (MCMC) methods offer a powerful solution. The central goal of MCMC is to construct a Markov chain whose stationary distribution matches the target distribution you want to sample from. Over time, samples drawn from this chain approximate samples from the true distribution, even if the distribution is high-dimensional or has a complex shape.

The Metropolis-Hastings algorithm is one of the most widely used MCMC methods. Here’s how it works, step by step:

  1. Start with an initial point, often chosen randomly or based on prior knowledge;
  2. Propose a new sample by perturbing the current point, using a proposal distribution (such as a normal distribution centered on the current point);
  3. Calculate the acceptance ratio, which compares the probability of the proposed point to the current point, adjusted for the proposal distribution;
  4. Accept the proposed point with a probability equal to the acceptance ratio. If not accepted, stay at the current point;
  5. Repeat steps 2–4 for many iterations, building a sequence (chain) of samples.

This iterative process allows you to explore the target distribution, even when it is not possible to sample from it directly. The quality and efficiency of the sampling depend on the choice of proposal distribution and how well the Markov chain explores the space.

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import math import random random.seed(42) def target_density(x: float) -> float: """Unnormalized target density: mixture of two Gaussians.""" a = math.exp(-0.5 * ((x + 2) / 1.0) ** 2) # N(-2, 1) b = 0.7 * math.exp(-0.5 * ((x - 2) / 0.6) ** 2) # 0.7 * N(2, 0.6) return a + b def propose_new_state(current: float, step_scale: float = 1.0) -> float: """Random-walk proposal: x' = x + Normal(0, step_scale).""" return current + random.gauss(0.0, step_scale) def metropolis_hastings( initial_state: float, num_steps: int, step_scale: float = 1.0, print_every: int = 1 ): current = initial_state samples = [] # Symmetric proposal -> q cancels out for step in range(1, num_steps + 1): proposal = propose_new_state(current, step_scale) p_current = target_density(current) p_proposal = target_density(proposal) # For symmetric proposal: acceptance_ratio = p_proposal / p_current acceptance_ratio = p_proposal / p_current if p_current > 0 else float("inf") acceptance_prob = min(1.0, acceptance_ratio) u = random.random() accepted = u < acceptance_prob if accepted: current = proposal samples.append(current) if step % print_every == 0: status = "ACCEPT" if accepted else "REJECT" bar_len = 20 filled = int(round(acceptance_prob * bar_len)) bar = "█" * filled + "·" * (bar_len - filled) print( f"step {step:03d} | {status:6} | " f"current={current:7.3f} | proposal={proposal:7.3f} | " f"p_cur={p_current:8.5f} | p_prop={p_proposal:8.5f} | " f"α={acceptance_prob:6.3f} [{bar}] | u={u:6.3f}" ) return samples samples = metropolis_hastings( initial_state=0.0, num_steps=30, step_scale=1.0, print_every=1 ) print("\nFirst 10 samples:", [round(s, 3) for s in samples[:10]]) print("Last 10 samples:", [round(s, 3) for s in samples[-10:]])
copy

When working with MCMC, you need to consider three important concepts: mixing, burn-in, and convergence. Mixing refers to how quickly the Markov chain explores all relevant regions of the target distribution. Good mixing means the chain moves efficiently and does not get stuck in one area. Burn-in is the initial set of samples you discard because the chain has not yet reached the stationary distribution — these early samples may be biased by the starting point. Convergence is the point at which the distribution of your samples closely matches the true target distribution; only after convergence do your samples become reliable for inference.

Despite its power, MCMC comes with some common pitfalls and assumptions. The chain must be able to reach every part of the distribution (ergodicity), and the proposal distribution must be chosen carefully to ensure good mixing. If the chain mixes poorly or you underestimate the burn-in period, your samples may not represent the true distribution. Additionally, MCMC assumes you can evaluate the target distribution up to a normalizing constant, but not always sample from it directly. Always check diagnostics to ensure your MCMC run is valid and your assumptions hold.

question mark

Which statement best describes the concept of burn-in in Markov Chain Monte Carlo (MCMC) methods?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2
some-alt