Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Classical Generalization Inequalities | Classical Generalization Bounds
Generalization Bounds

bookClassical Generalization Inequalities

To understand generalization in machine learning, you need to know how well a hypothesis trained on a finite sample will perform on unseen data. Classical VC-based generalization bounds give you a way to quantify this, by relating the true risk (expected loss on new data) to the empirical risk (average loss on your training set). The typical structure of such a bound is:

TrueΒ risk≀EmpiricalΒ risk+ComplexityΒ penalty\text{True risk} ≀ \text{Empirical risk} + \text{Complexity penalty}

Here, the empirical risk measures how well your hypothesis fits the training data, while the complexity penalty accounts for the richness of your hypothesis class and how many samples you have. The richer the class (higher VC dimension) or the fewer the samples, the larger the penalty. As you gather more data, this penalty shrinks, meaning your empirical results become more reliable for the true distribution.

Note
Note

A classical VC generalization inequality states that, with probability at least 1βˆ’Ξ΄1 - Ξ΄ over the choice of a random sample of size n, for every hypothesis hh in a class HH of VC dimension dd:

R(h)≀R^(h)+8n(d,log⁑(2end)+log⁑(4Ξ΄))R(h) \le \hat{R}(h) + \sqrt{ \frac{8}{n} \left( d ,\log\left(\frac{2 e n}{d}\right) + \log\left(\frac{4}{\delta}\right) \right) }

where:

  • R(h)R(h): expected error of hypothesis (h) on unseen data;
  • R^(h)\hat{R}(h): average error of hypothesis (h) on the training sample;
  • dd: VC dimension of the hypothesis class;
  • nn: sample size;
  • Ξ΄\delta: confidence parameter.
Why does the bound depend on sample size?
expand arrow

The more samples you have, the better your empirical risk estimates the true risk. With a small sample, random fluctuations can make the empirical risk unreliable, so the bound must allow for more error. As the sample size increases, these fluctuations average out, and the complexity penalty shrinks.

Why does the bound depend on VC dimension?
expand arrow

A higher VC dimension means your hypothesis class can fit more patterns, including random noise. This flexibility increases the risk of overfitting, so the bound includes a larger penalty for higher VC dimension. A smaller VC dimension means simpler models, which are less likely to overfit.

What does the formal inequality tell you?
expand arrow

The inequality gives a high-probability guarantee: for any hypothesis in your class, its true risk is at most its empirical risk plus a term that depends on both the VC dimension and the sample size. This helps you understand how much trust to place in your training results.

123456789101112131415161718192021222324
import numpy as np import matplotlib.pyplot as plt # Fixed VC dimension d = 10 # Confidence parameter delta = 0.05 # Range of sample sizes n_values = np.arange(20, 2001, 20) def complexity_term(n, d, delta): from math import e, log, sqrt return np.sqrt((8 / n) * (d * np.log(2 * e * n / d) + np.log(4 / delta))) penalties = [complexity_term(n, d, delta) for n in n_values] plt.figure(figsize=(8, 5)) plt.plot(n_values, penalties, label=f"VC dimension d={d}") plt.xlabel("Sample size (n)") plt.ylabel("Complexity penalty") plt.title("Complexity penalty vs. sample size (fixed VC dimension)") plt.grid(True) plt.legend() plt.show()
copy
question mark

Why does the generalization bound become tighter as the sample size grows?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain what the VC dimension means in simple terms?

How does the complexity penalty affect model selection in practice?

What happens if the VC dimension is very high compared to the sample size?

bookClassical Generalization Inequalities

Swipe to show menu

To understand generalization in machine learning, you need to know how well a hypothesis trained on a finite sample will perform on unseen data. Classical VC-based generalization bounds give you a way to quantify this, by relating the true risk (expected loss on new data) to the empirical risk (average loss on your training set). The typical structure of such a bound is:

TrueΒ risk≀EmpiricalΒ risk+ComplexityΒ penalty\text{True risk} ≀ \text{Empirical risk} + \text{Complexity penalty}

Here, the empirical risk measures how well your hypothesis fits the training data, while the complexity penalty accounts for the richness of your hypothesis class and how many samples you have. The richer the class (higher VC dimension) or the fewer the samples, the larger the penalty. As you gather more data, this penalty shrinks, meaning your empirical results become more reliable for the true distribution.

Note
Note

A classical VC generalization inequality states that, with probability at least 1βˆ’Ξ΄1 - Ξ΄ over the choice of a random sample of size n, for every hypothesis hh in a class HH of VC dimension dd:

R(h)≀R^(h)+8n(d,log⁑(2end)+log⁑(4Ξ΄))R(h) \le \hat{R}(h) + \sqrt{ \frac{8}{n} \left( d ,\log\left(\frac{2 e n}{d}\right) + \log\left(\frac{4}{\delta}\right) \right) }

where:

  • R(h)R(h): expected error of hypothesis (h) on unseen data;
  • R^(h)\hat{R}(h): average error of hypothesis (h) on the training sample;
  • dd: VC dimension of the hypothesis class;
  • nn: sample size;
  • Ξ΄\delta: confidence parameter.
Why does the bound depend on sample size?
expand arrow

The more samples you have, the better your empirical risk estimates the true risk. With a small sample, random fluctuations can make the empirical risk unreliable, so the bound must allow for more error. As the sample size increases, these fluctuations average out, and the complexity penalty shrinks.

Why does the bound depend on VC dimension?
expand arrow

A higher VC dimension means your hypothesis class can fit more patterns, including random noise. This flexibility increases the risk of overfitting, so the bound includes a larger penalty for higher VC dimension. A smaller VC dimension means simpler models, which are less likely to overfit.

What does the formal inequality tell you?
expand arrow

The inequality gives a high-probability guarantee: for any hypothesis in your class, its true risk is at most its empirical risk plus a term that depends on both the VC dimension and the sample size. This helps you understand how much trust to place in your training results.

123456789101112131415161718192021222324
import numpy as np import matplotlib.pyplot as plt # Fixed VC dimension d = 10 # Confidence parameter delta = 0.05 # Range of sample sizes n_values = np.arange(20, 2001, 20) def complexity_term(n, d, delta): from math import e, log, sqrt return np.sqrt((8 / n) * (d * np.log(2 * e * n / d) + np.log(4 / delta))) penalties = [complexity_term(n, d, delta) for n in n_values] plt.figure(figsize=(8, 5)) plt.plot(n_values, penalties, label=f"VC dimension d={d}") plt.xlabel("Sample size (n)") plt.ylabel("Complexity penalty") plt.title("Complexity penalty vs. sample size (fixed VC dimension)") plt.grid(True) plt.legend() plt.show()
copy
question mark

Why does the generalization bound become tighter as the sample size grows?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt