Lære Uniform Convergence: From Pointwise to Uniform Guarantees

Sveip for å vise menyen

When you study generalization in machine learning, it is crucial to understand the difference between guarantees that apply to a single hypothesis and those that apply to all hypotheses within a class. A pointwise guarantee provides a statement about how well the empirical risk (the average loss on your training data) approximates the true risk (the expected loss over the data distribution) for a specific hypothesis. In contrast, a uniform guarantee asserts that this approximation holds simultaneously for every hypothesis in a given class. This distinction is at the heart of why uniform convergence is so important for learning theory.

Definition

Uniform convergence means that, with high probability, the empirical risk and true risk are close for every hypothesis in the class. This is essential because, during training, you select your final hypothesis based on its performance on the training data. If you only had pointwise convergence, you could not ensure that the selected hypothesis generalizes well, since the guarantee might not hold for the one you choose. Uniform convergence underpins the reliability of empirical risk minimization and is a cornerstone of modern learning theory.

Intuition: Why is uniform convergence stronger than pointwise?

When you train a model, you search through many hypotheses to find the one that performs best on your training data. If your generalization guarantee only applies to one fixed hypothesis (pointwise), it might not hold for the hypothesis you actually select, since your choice depends on the data. Uniform convergence is stronger because it ensures the empirical risk is close to the true risk for all hypotheses, so whichever one you choose, the guarantee holds.

Formal property: Uniform convergence

For a hypothesis class H and a loss function, uniform convergence means that for any small error ε and confidence δ, with high probability (at least 1 − δ) over random samples, the following holds:

For all hypotheses h in H,

| empirical risk of h − true risk of h | ≤ ε.


              12345678910111213141516171819202122232425262728293031
            
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
n_samples = 100
n_hypotheses = 5

# Simulate true risks for each hypothesis
true_risks = np.linspace(0.1, 0.5, n_hypotheses)
empirical_risks = []

for risk in true_risks:
    # Simulate empirical risk as sample mean of Bernoulli trials
    samples = np.random.binomial(1, risk, size=n_samples)
    # Track empirical risk over increasing sample sizes
    curve = [np.mean(samples[:i+1]) for i in range(n_samples)]
    empirical_risks.append(curve)

x = np.arange(1, n_samples+1)

plt.figure(figsize=(8,5))
for idx, curve in enumerate(empirical_risks):
    plt.plot(x, curve, label=f"Hypothesis {idx+1} (true risk={true_risks[idx]:.2f})")
    plt.hlines(true_risks[idx], 1, n_samples, colors='k', linestyles='dashed', alpha=0.4)

plt.xlabel("Sample size")
plt.ylabel("Empirical risk")
plt.title("Empirical vs. True Risk for Multiple Hypotheses")
plt.legend()
plt.tight_layout()
plt.show()

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 1

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 2. Kapittel 1