Summary  
This chapter covers the lazy training phenomenon in infinite-width neural networks, where parameter updates remain negligible and the learning dynamics reduce to linear kernel regression with a fixed neural tangent kernel (NTK), whose static inductive bias is determined by the network’s architecture and initialization.

General domain of usage  
Machine learning model training and generalization analysis.

Lazy training is a phenomenon that emerges when you consider neural networks in the infinite-width limit, as described by the linearization approach from previous sections. In this regime, the parameters of the network change very little during training—so little, in fact, that the network's behavior can be closely approximated by its first-order Taylor expansion around the initial parameters. This means that, rather than learning new features or representations, the network effectively acts as a linear model in the space of its parameters. The term **"lazy"** refers to the fact that the network does not significantly update its internal representations, relying instead on the initial random features it started with. As a result, the training dynamics are governed almost entirely by the fixed **neural tangent kernel (NTK)** determined at initialization, and the network's evolution is described by kernel regression with this NTK.

The **inductive bias** in the NTK regime is fundamentally tied to the properties of the kernel itself. Since the `NTK` is determined by the network's architecture and its random initialization, it encodes the kinds of functions the network can represent and generalize. Different architectures—such as fully connected networks versus convolutional networks—produce different `NTKs`, and thus different inductive biases. For example, a convolutional architecture will yield an `NTK` that favors translation-invariant solutions, while a fully connected network's `NTK` does not. The initialization also plays a crucial role: the distribution of the initial weights affects the `NTK` and thus alters the implicit regularization imposed on the learning process. In the `NTK` regime, this inductive bias is static throughout training, as the kernel does not evolve. This contrasts with the finite-width case, where the kernel can change and feature learning can occur. As a consequence, the generalization performance and the types of functions learned in the `NTK` regime are limited by the expressive power of the fixed kernel, rather than by the network's ability to adapt its features during training.

Which statement best describes lazy training and inductive bias in the NTK regime?

A rigorous, theory-driven exploration of Neural Tangent Kernel (NTK) theory: infinite-width limits, Gaussian process correspondence, linearized training, kernel dynamics, and the explanatory boundaries of NTK in deep learning.

Examine the asymptotic behavior of neural networks as width approaches infinity, focusing on initialization and concentration phenomena.

Explore the linearization of training dynamics, the definition of the Neural Tangent Kernel, and the equivalence to kernel regression.

Critically assess what NTK theory explains about deep learning, its boundaries, and the phenomena it cannot capture.

Lazy Training and Inductive Bias