Apprendre Expressivity and Generalization in the Infinite-Width Limit

Glissez pour afficher le menu

Expressivity in Infinite-Width Neural Networks

When considering neural networks in the infinite-width limit, mean field theory provides a rigorous framework to analyze their expressivity. In this regime, the network's behavior is dominated not by the specific values of individual weights, but by the statistical distribution of these weights across layers. This shift leads to a fundamental change in how you should understand the representational capabilities of such networks.

In the infinite-width limit, each neuron's pre-activation can be described by a distribution determined by the law of large numbers. As a result, the entire layer's output can be characterized by the evolution of these distributions, rather than by the precise configuration of weights. This distributional perspective means that the network's expressivity is governed by the family of functions that can be realized through these evolving distributions. Essentially, the network becomes a generator of functions parameterized by the mean and covariance of the weight distributions, rather than by individual parameters.

One crucial consequence is that the function space accessible to infinite-width networks is closely linked to the choice of activation function and initialization. For example, with certain choices, the network's output converges to a Gaussian process in the infinite-width limit, as established by the neural tangent kernel (NTK) framework. This connection implies that, in this regime, the network's expressivity is equivalent to that of a kernel method defined by the NTK. Thus, distributional representations in mean field theory serve as the foundation for understanding which functions the network can approximate or represent, and how the architecture and activation functions shape this space.

Generalization Properties Predicted by Mean Field Theory

Mean field theory also provides key insights into the generalization properties of neural networks as their width grows. In the infinite-width limit, training dynamics often become linearized around the initialization, and the evolution of the network's function during training can be described by a deterministic kernel—again, typically the NTK. This linearization leads to several important theoretical predictions about generalization.

Since the network behaves like a kernel machine, its generalization performance can be analyzed using tools from kernel methods;
The generalization error is determined by the properties of the NTK and the distribution of the training data;
If the NTK aligns well with the structure of the data, the network will generalize effectively, even as its width becomes very large.
This means that, contrary to classical concerns about overfitting in large models, infinite-width networks may actually exhibit improved generalization due to the regularizing effect of the mean field dynamics.

Key theoretical results have shown that, under certain conditions, the test error of an infinite-width neural network converges to that of the corresponding kernel regression problem defined by the NTK. This convergence provides a predictive framework for understanding how neural networks generalize in the large-width regime, and why they can avoid overfitting despite their immense parameter count. Furthermore, mean field theory reveals that the implicit bias of gradient descent in this regime is towards functions favored by the NTK, which often aligns with smoothness or simplicity properties in the data.

Overall, mean field theory unifies the understanding of expressivity and generalization in infinite-width neural networks by connecting them to kernel methods and distributional representations. This perspective clarifies why such networks can be both highly expressive and capable of generalizing well, provided the architecture and training dynamics are appropriately chosen.

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 1

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 3. Chapitre 1