Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Infinite-Width Neural Networks: Formalization and Assumptions | Mean Field Limits of Neural Networks
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Mean Field Theory for Neural Networks

bookInfinite-Width Neural Networks: Formalization and Assumptions

To understand the infinite-width limit of neural networks, consider a fully connected feedforward network with LL layers. Let NN_\ell denote the number of neurons in layer \ell, where N0N_0 is the input dimension and NLN_L is the output dimension. For each layer =1,,L\ell = 1, \ldots, L, let W()RN×N1W^{(\ell)} \in \mathbb{R}^{N_\ell \times N_{\ell-1}} be the weight matrix, b()RNb^{(\ell)} \in \mathbb{R}^{N_\ell} the bias vector, and h()RNh^{(\ell)} \in \mathbb{R}^{N_\ell} the pre-activation vector.

The activations x()RNx^{(\ell)} \in \mathbb{R}^{N_\ell} are defined by applying a nonlinearity ϕ\phi elementwise:

x()=ϕ!(h()),x^{(\ell)} = \phi!\left(h^{(\ell)}\right),

where ϕ\phi may be ReLU, tanh, or another activation function.

Given an input x(0)x^{(0)}, the forward pass is defined recursively by

h()=W()x(1)+b(),x()=ϕ!(h()),=1,,L.h^{(\ell)} = W^{(\ell)} x^{(\ell-1)} + b^{(\ell)}, \qquad x^{(\ell)} = \phi!\left(h^{(\ell)}\right), \quad \ell = 1, \ldots, L.

The infinite-width limit corresponds to the regime in which the widths of all hidden layers,

N1,N2,,NL1,N_1, N_2, \ldots, N_{L-1},

tend to infinity. In this limit, the network’s behavior is governed by statistical regularities arising from the law of large numbers. As a result, individual neurons become negligible, and the network can be described through a mean field formulation that captures the evolution of distributions rather than finite-dimensional parameter vectors.

Key Mathematical Assumptions for Mean Field Analysis

  • Weights and Biases are Initialized Independently: the weights Wij()W^{(\ell)}_{ij} and biases bi()b^{(\ell)}_i are assumed to be initialized independently and identically distributed (i.i.d.) across all layers and neurons;
  • Initialization Scaling: weights are initialized as Wij()N(0,σw2/N1)W^{(\ell)}_{ij} \sim \mathcal{N}(0, \sigma_w^2 / N_{\ell-1}) and biases as bi()N(0,σb2)b^{(\ell)}_i \sim \mathcal{N}(0, \sigma_b^2), where σw2\sigma_w^2 and σb2\sigma_b^2 are fixed variances; this scaling ensures pre-activations h()h^{(\ell)} remain of order one as width grows;
  • Activation Function Regularity: the activation function ϕ\phi is chosen so that the moments of activations exist and remain well-behaved under the induced distributions;
  • Input Independence: the input x(0)x^{(0)} has a fixed distribution independent of the network's width.

These assumptions ensure that as the width of each hidden layer increases, the network's behavior can be described using mean field theory.

The infinite-width limit is analytically tractable because, as the number of neurons in each layer grows, the central limit theorem implies that the pre-activations at each layer become Gaussian distributed, provided the independence and scaling assumptions hold. This allows the complex, high-dimensional behavior of the network to be described by the evolution of distributional quantities — such as means and variances — rather than by tracking every individual parameter. As a result, the network's behavior can be captured by deterministic equations in the limit of infinite width, greatly simplifying analysis. This tractability enables you to derive precise predictions about signal propagation, training dynamics, and generalization properties, and it forms the foundation for mean field theory in neural networks.

question mark

What does the infinite-width limit mean in the context of neural networks?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain what mean field theory predicts about neural networks in the infinite-width limit?

How does the infinite-width limit affect training and generalization in practice?

What are some limitations of the infinite-width approximation?

bookInfinite-Width Neural Networks: Formalization and Assumptions

Swipe um das Menü anzuzeigen

To understand the infinite-width limit of neural networks, consider a fully connected feedforward network with LL layers. Let NN_\ell denote the number of neurons in layer \ell, where N0N_0 is the input dimension and NLN_L is the output dimension. For each layer =1,,L\ell = 1, \ldots, L, let W()RN×N1W^{(\ell)} \in \mathbb{R}^{N_\ell \times N_{\ell-1}} be the weight matrix, b()RNb^{(\ell)} \in \mathbb{R}^{N_\ell} the bias vector, and h()RNh^{(\ell)} \in \mathbb{R}^{N_\ell} the pre-activation vector.

The activations x()RNx^{(\ell)} \in \mathbb{R}^{N_\ell} are defined by applying a nonlinearity ϕ\phi elementwise:

x()=ϕ!(h()),x^{(\ell)} = \phi!\left(h^{(\ell)}\right),

where ϕ\phi may be ReLU, tanh, or another activation function.

Given an input x(0)x^{(0)}, the forward pass is defined recursively by

h()=W()x(1)+b(),x()=ϕ!(h()),=1,,L.h^{(\ell)} = W^{(\ell)} x^{(\ell-1)} + b^{(\ell)}, \qquad x^{(\ell)} = \phi!\left(h^{(\ell)}\right), \quad \ell = 1, \ldots, L.

The infinite-width limit corresponds to the regime in which the widths of all hidden layers,

N1,N2,,NL1,N_1, N_2, \ldots, N_{L-1},

tend to infinity. In this limit, the network’s behavior is governed by statistical regularities arising from the law of large numbers. As a result, individual neurons become negligible, and the network can be described through a mean field formulation that captures the evolution of distributions rather than finite-dimensional parameter vectors.

Key Mathematical Assumptions for Mean Field Analysis

  • Weights and Biases are Initialized Independently: the weights Wij()W^{(\ell)}_{ij} and biases bi()b^{(\ell)}_i are assumed to be initialized independently and identically distributed (i.i.d.) across all layers and neurons;
  • Initialization Scaling: weights are initialized as Wij()N(0,σw2/N1)W^{(\ell)}_{ij} \sim \mathcal{N}(0, \sigma_w^2 / N_{\ell-1}) and biases as bi()N(0,σb2)b^{(\ell)}_i \sim \mathcal{N}(0, \sigma_b^2), where σw2\sigma_w^2 and σb2\sigma_b^2 are fixed variances; this scaling ensures pre-activations h()h^{(\ell)} remain of order one as width grows;
  • Activation Function Regularity: the activation function ϕ\phi is chosen so that the moments of activations exist and remain well-behaved under the induced distributions;
  • Input Independence: the input x(0)x^{(0)} has a fixed distribution independent of the network's width.

These assumptions ensure that as the width of each hidden layer increases, the network's behavior can be described using mean field theory.

The infinite-width limit is analytically tractable because, as the number of neurons in each layer grows, the central limit theorem implies that the pre-activations at each layer become Gaussian distributed, provided the independence and scaling assumptions hold. This allows the complex, high-dimensional behavior of the network to be described by the evolution of distributional quantities — such as means and variances — rather than by tracking every individual parameter. As a result, the network's behavior can be captured by deterministic equations in the limit of infinite width, greatly simplifying analysis. This tractability enables you to derive precise predictions about signal propagation, training dynamics, and generalization properties, and it forms the foundation for mean field theory in neural networks.

question mark

What does the infinite-width limit mean in the context of neural networks?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 1
some-alt