Learn Rademacher Complexity: Data-Dependent Bounds | Data-Dependent Complexity Measures

Swipe to show menu

When you want to understand how well a learning algorithm will generalize from a finite dataset to unseen data, you need to measure the complexity of the hypothesis class it uses. Previously, you learned about the VC dimension, a classic way to quantify this complexity in a data-independent manner — it depends only on the hypothesis class, not on the specific data sample. However, this can lead to loose generalization bounds, especially when the actual data makes learning easier than the worst-case scenario considered by the VC dimension.

This motivates the use of data-dependent complexity measures. These measures adapt to the actual data you have, often giving much tighter and more realistic generalization guarantees. Among these, Rademacher complexity stands out as a powerful and widely used tool. It quantifies how well a hypothesis class can fit random noise on your specific dataset, offering a direct way to connect the richness of your model class to the data at hand.

Definition

Definition: The empirical Rademacher complexity of a function class $F$ on a sample $S = (x_1, ..., x_n)$ is defined as the expected value of the supremum over $F$ of the average correlation between functions in $F$ and random signs. Formally, it is:

\hat{\mathcal{R}}_S(F) = \mathbb{E}_\sigma \left[ \sup_{f \in F} \frac{1}{n} \sum_{i=1}^n \sigma_i f(x_i) \right]

where each $\sigma_i$ is an independent random variable taking values $+1$ or $-1$ with equal probability (the "random signs").

Why random signs?

Using random signs (Rademacher variables) tests how well a function class can fit pure noise. If a class can match random patterns in the data, it is likely too complex and prone to overfitting. The Rademacher complexity measures this tendency directly.

Formal definition

The empirical Rademacher complexity is computed by averaging, over all possible assignments of random $+1$ and $-1$ to the data points, the largest average correlation any function in the class can achieve with these random assignments. This gives a data-dependent measure of how "rich" or "flexible" the function class is on the given sample.


              12345678910111213141516171819202122
            
import numpy as np

# Simulate empirical Rademacher complexity for threshold functions on 1D data
def rademacher_complexity_thresholds(X, n_trials=1000):
    n = len(X)
    # Function class: all threshold functions f_t(x) = 1 if x >= t else 0
    thresholds = np.unique(X)
    complexities = []
    for _ in range(n_trials):
        sigma = np.random.choice([-1, 1], size=n)
        max_corr = 0
        for t in thresholds:
            f_t = (X >= t).astype(float)
            corr = np.dot(sigma, f_t) / n
            max_corr = max(max_corr, abs(corr))
        complexities.append(max_corr)
    return np.mean(complexities)

np.random.seed(0)
X = np.random.uniform(0, 1, size=20)
rc = rademacher_complexity_thresholds(X)
print(f"Estimated empirical Rademacher complexity (thresholds, n=20): {rc:.3f}")

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2