Classical Generalization Inequalities
To understand generalization in machine learning, you need to know how well a hypothesis trained on a finite sample will perform on unseen data. Classical VC-based generalization bounds give you a way to quantify this, by relating the true risk (expected loss on new data) to the empirical risk (average loss on your training set). The typical structure of such a bound is:
True risk≤Empirical risk+Complexity penaltyHere, the empirical risk measures how well your hypothesis fits the training data, while the complexity penalty accounts for the richness of your hypothesis class and how many samples you have. The richer the class (higher VC dimension) or the fewer the samples, the larger the penalty. As you gather more data, this penalty shrinks, meaning your empirical results become more reliable for the true distribution.
A classical VC generalization inequality states that, with probability at least 1−δ over the choice of a random sample of size n, for every hypothesis h in a class H of VC dimension d:
R(h)≤R^(h)+n8(d,log(d2en)+log(δ4))where:
- R(h): expected error of hypothesis (h) on unseen data;
- R^(h): average error of hypothesis (h) on the training sample;
- d: VC dimension of the hypothesis class;
- n: sample size;
- δ: confidence parameter.
The more samples you have, the better your empirical risk estimates the true risk. With a small sample, random fluctuations can make the empirical risk unreliable, so the bound must allow for more error. As the sample size increases, these fluctuations average out, and the complexity penalty shrinks.
A higher VC dimension means your hypothesis class can fit more patterns, including random noise. This flexibility increases the risk of overfitting, so the bound includes a larger penalty for higher VC dimension. A smaller VC dimension means simpler models, which are less likely to overfit.
The inequality gives a high-probability guarantee: for any hypothesis in your class, its true risk is at most its empirical risk plus a term that depends on both the VC dimension and the sample size. This helps you understand how much trust to place in your training results.
123456789101112131415161718192021222324import numpy as np import matplotlib.pyplot as plt # Fixed VC dimension d = 10 # Confidence parameter delta = 0.05 # Range of sample sizes n_values = np.arange(20, 2001, 20) def complexity_term(n, d, delta): from math import e, log, sqrt return np.sqrt((8 / n) * (d * np.log(2 * e * n / d) + np.log(4 / delta))) penalties = [complexity_term(n, d, delta) for n in n_values] plt.figure(figsize=(8, 5)) plt.plot(n_values, penalties, label=f"VC dimension d={d}") plt.xlabel("Sample size (n)") plt.ylabel("Complexity penalty") plt.title("Complexity penalty vs. sample size (fixed VC dimension)") plt.grid(True) plt.legend() plt.show()
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Can you explain what the VC dimension means in simple terms?
How does the complexity penalty affect model selection in practice?
What happens if the VC dimension is very high compared to the sample size?
Incrível!
Completion taxa melhorada para 11.11
Classical Generalization Inequalities
Deslize para mostrar o menu
To understand generalization in machine learning, you need to know how well a hypothesis trained on a finite sample will perform on unseen data. Classical VC-based generalization bounds give you a way to quantify this, by relating the true risk (expected loss on new data) to the empirical risk (average loss on your training set). The typical structure of such a bound is:
True risk≤Empirical risk+Complexity penaltyHere, the empirical risk measures how well your hypothesis fits the training data, while the complexity penalty accounts for the richness of your hypothesis class and how many samples you have. The richer the class (higher VC dimension) or the fewer the samples, the larger the penalty. As you gather more data, this penalty shrinks, meaning your empirical results become more reliable for the true distribution.
A classical VC generalization inequality states that, with probability at least 1−δ over the choice of a random sample of size n, for every hypothesis h in a class H of VC dimension d:
R(h)≤R^(h)+n8(d,log(d2en)+log(δ4))where:
- R(h): expected error of hypothesis (h) on unseen data;
- R^(h): average error of hypothesis (h) on the training sample;
- d: VC dimension of the hypothesis class;
- n: sample size;
- δ: confidence parameter.
The more samples you have, the better your empirical risk estimates the true risk. With a small sample, random fluctuations can make the empirical risk unreliable, so the bound must allow for more error. As the sample size increases, these fluctuations average out, and the complexity penalty shrinks.
A higher VC dimension means your hypothesis class can fit more patterns, including random noise. This flexibility increases the risk of overfitting, so the bound includes a larger penalty for higher VC dimension. A smaller VC dimension means simpler models, which are less likely to overfit.
The inequality gives a high-probability guarantee: for any hypothesis in your class, its true risk is at most its empirical risk plus a term that depends on both the VC dimension and the sample size. This helps you understand how much trust to place in your training results.
123456789101112131415161718192021222324import numpy as np import matplotlib.pyplot as plt # Fixed VC dimension d = 10 # Confidence parameter delta = 0.05 # Range of sample sizes n_values = np.arange(20, 2001, 20) def complexity_term(n, d, delta): from math import e, log, sqrt return np.sqrt((8 / n) * (d * np.log(2 * e * n / d) + np.log(4 / delta))) penalties = [complexity_term(n, d, delta) for n in n_values] plt.figure(figsize=(8, 5)) plt.plot(n_values, penalties, label=f"VC dimension d={d}") plt.xlabel("Sample size (n)") plt.ylabel("Complexity penalty") plt.title("Complexity penalty vs. sample size (fixed VC dimension)") plt.grid(True) plt.legend() plt.show()
Obrigado pelo seu feedback!