Вивчайте Generalization Bounds with Rademacher Complexity

Свайпніть щоб показати меню

When you want to predict how well your chosen model will perform on new, unseen data, you need a way to relate its performance on your current sample to its expected performance in general. One powerful approach is to use generalization bounds based on Rademacher complexity. These bounds are particularly useful because they adapt to the actual data you have, rather than only considering the worst-case scenario for your entire hypothesis class.

A Rademacher-based generalization bound typically has the following structure: it states that, with high probability, the true risk (expected error) of any function in your hypothesis class is not much larger than its empirical risk (error on your sample), plus a term involving the empirical Rademacher complexity of the class and a confidence term that depends on the sample size. The empirical Rademacher complexity measures how well your class can fit random noise on your specific dataset, providing a data-dependent measure of capacity.

Generalization Bound (using empirical Rademacher complexity)

With probability at least $1 - \delta$ over the choice of the sample $S = (x_1, ..., x_n)$ , for all functions $f$ in a function class $\mathcal{F}$ :

\mathbb{E}_{x}[f(x)] \leq \frac{1}{n} \sum_{i=1}^n f(x_i) + 2 \hat{\mathcal{R}}_S(\mathcal{F}) + 3 \sqrt{\frac{\log(2/\delta)}{2n}}

where:

$\mathbb{E}_{x}[f(x)]$ is the expected (true) risk of $f$ ;
$\frac{1}{n} \sum_{i=1}^n f(x_i)$ is the empirical risk on the sample;
$\hat{\mathcal{R}}_S(\mathcal{F})$ is the empirical Rademacher complexity of $\mathcal{F}$ on sample $S$ ;
$n$ is the sample size;
$\delta$ is the confidence parameter.

Intuition: Why data-dependent terms can be tighter

Rademacher complexity uses the actual data sample to measure how well your hypothesis class can fit random noise. If your data is simple or your class does not fit noise well on this sample, the complexity term can be much smaller than worst-case estimates, leading to a tighter (more realistic) bound.

Formal bound

The formal generalization bound combines the empirical risk, the empirical Rademacher complexity, and a confidence term. This shows that generalization depends not just on the hypothesis class, but also on how it behaves on your specific data.


              12345678910111213141516171819202122232425262728293031323334353637
            
import numpy as np
import matplotlib.pyplot as plt

# Toy hypothesis class: all threshold functions on [0,1]
# For simplicity, we approximate the empirical Rademacher complexity for thresholds

def empirical_rademacher_complexity(n, num_trials=100):
    # For thresholds, the empirical Rademacher complexity is about 1/sqrt(n)
    # We'll estimate it empirically for illustration
    complexities = []
    for _ in range(num_trials):
        X = np.sort(np.random.uniform(0, 1, n))
        sigma = np.random.choice([-1, 1], size=n)
        thresholds = X
        max_sum = 0
        # For each threshold, compute sum of sigma_i * f(x_i)
        for t in thresholds:
            f_x = (X >= t).astype(float)
            s = np.abs(np.sum(sigma * f_x)) / n
            if s > max_sum:
                max_sum = s
        complexities.append(max_sum)
    return np.mean(complexities)

sample_sizes = np.arange(10, 201, 10)
complexities = [empirical_rademacher_complexity(n) for n in sample_sizes]

plt.figure(figsize=(8, 4))
plt.plot(sample_sizes, complexities, marker='o', label="Empirical Rademacher Complexity")
plt.plot(sample_sizes, 1/np.sqrt(sample_sizes), '--', label="1/sqrt(n) reference")
plt.xlabel("Sample size (n)")
plt.ylabel("Empirical Rademacher Complexity")
plt.title("Rademacher Complexity vs. Sample Size (Threshold Functions)")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 3

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 2. Розділ 3