Learn Classical Generalization Inequalities | Classical Generalization Bounds

Swipe to show menu

To understand generalization in machine learning, you need to know how well a hypothesis trained on a finite sample will perform on unseen data. Classical VC-based generalization bounds give you a way to quantify this, by relating the true risk (expected loss on new data) to the empirical risk (average loss on your training set). The typical structure of such a bound is:

\text{True risk} ≤ \text{Empirical risk} + \text{Complexity penalty}

Here, the empirical risk measures how well your hypothesis fits the training data, while the complexity penalty accounts for the richness of your hypothesis class and how many samples you have. The richer the class (higher VC dimension) or the fewer the samples, the larger the penalty. As you gather more data, this penalty shrinks, meaning your empirical results become more reliable for the true distribution.

Note

A classical VC generalization inequality states that, with probability at least $1 - δ$ over the choice of a random sample of size n, for every hypothesis $h$ in a class $H$ of VC dimension $d$ :

R(h) \le \hat{R}(h) + \sqrt{ \frac{8}{n} \left( d ,\log\left(\frac{2 e n}{d}\right) + \log\left(\frac{4}{\delta}\right) \right) }

where:

$R(h)$ : expected error of hypothesis (h) on unseen data;
$\hat{R}(h)$ : average error of hypothesis (h) on the training sample;
$d$ : VC dimension of the hypothesis class;
$n$ : sample size;
$\delta$ : confidence parameter.

Why does the bound depend on sample size?

The more samples you have, the better your empirical risk estimates the true risk. With a small sample, random fluctuations can make the empirical risk unreliable, so the bound must allow for more error. As the sample size increases, these fluctuations average out, and the complexity penalty shrinks.

Why does the bound depend on VC dimension?

A higher VC dimension means your hypothesis class can fit more patterns, including random noise. This flexibility increases the risk of overfitting, so the bound includes a larger penalty for higher VC dimension. A smaller VC dimension means simpler models, which are less likely to overfit.

What does the formal inequality tell you?

The inequality gives a high-probability guarantee: for any hypothesis in your class, its true risk is at most its empirical risk plus a term that depends on both the VC dimension and the sample size. This helps you understand how much trust to place in your training results.


              123456789101112131415161718192021222324
            
import numpy as np
import matplotlib.pyplot as plt

# Fixed VC dimension
d = 10
# Confidence parameter
delta = 0.05
# Range of sample sizes
n_values = np.arange(20, 2001, 20)

def complexity_term(n, d, delta):
    from math import e, log, sqrt
    return np.sqrt((8 / n) * (d * np.log(2 * e * n / d) + np.log(4 / delta)))

penalties = [complexity_term(n, d, delta) for n in n_values]

plt.figure(figsize=(8, 5))
plt.plot(n_values, penalties, label=f"VC dimension d={d}")
plt.xlabel("Sample size (n)")
plt.ylabel("Complexity penalty")
plt.title("Complexity penalty vs. sample size (fixed VC dimension)")
plt.grid(True)
plt.legend()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 3