Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Emergence and Scaling Laws | Zero-Shot Generalization Foundations
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Zero-Shot and Few-Shot Generalization

bookEmergence and Scaling Laws

You will often hear that as models grow larger and are trained on more data, they not only improve at tasks they were already doing, but also begin to display entirely new abilities. This phenomenon is at the heart of scaling laws — empirical and theoretical relationships that describe how a model's performance changes as you increase its size (number of parameters), the amount of data it sees, or the compute used during training. In practice, researchers have observed that as you scale up these factors, models do not simply become incrementally better; instead, they sometimes exhibit sudden, qualitative jumps in capability. These jumps are called emergent abilities.

Scaling laws provide a way to predict how much improvement you might expect by increasing model size or dataset size. For instance, doubling the number of parameters might yield a predictable drop in error rate, but at certain scales, the model might suddenly master a new skill it previously could not — such as understanding jokes, following complex instructions, or performing arithmetic.

Empirical studies have shown that, up to a point, the relationship between model scale and performance follows a smooth curve, often a power law. However, at certain thresholds, new behaviors seem to "emerge" almost as if the model underwent a phase transition — a sudden change in state, similar to how water turns to ice at 0°C. These emergent abilities are not just better performance, but qualitatively different behaviors that were not present in smaller models.

A simplified scaling law equation
expand arrow

A basic form of a scaling law can be written as:

L=aNα+bL = aN^{-\alpha} + b

where LL is the loss (error), NN is the number of model parameters (or data size), aa and bb are constants, and α\alpha is the scaling exponent. This equation predicts that as you increase NN, the loss LL decreases smoothly.

Emergent abilities and phase transitions
expand arrow

Emergent abilities do not arise gradually, but often appear abruptly once a model crosses a certain scale. This is analogous to a phase transition in physics, where a small change in temperature can suddenly turn water into ice. In models, once a critical threshold is passed, new qualitative behaviors — such as zero-shot reasoning or compositional generalization — can suddenly emerge.

Note
Note

Emergence is not simply more of the same. It refers to the appearance of qualitatively new behaviors that were not present, even in a weaker form, in smaller or less-trained models.

question mark

Which statement best summarizes the role of scaling laws in deep learning models?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

bookEmergence and Scaling Laws

Scorri per mostrare il menu

You will often hear that as models grow larger and are trained on more data, they not only improve at tasks they were already doing, but also begin to display entirely new abilities. This phenomenon is at the heart of scaling laws — empirical and theoretical relationships that describe how a model's performance changes as you increase its size (number of parameters), the amount of data it sees, or the compute used during training. In practice, researchers have observed that as you scale up these factors, models do not simply become incrementally better; instead, they sometimes exhibit sudden, qualitative jumps in capability. These jumps are called emergent abilities.

Scaling laws provide a way to predict how much improvement you might expect by increasing model size or dataset size. For instance, doubling the number of parameters might yield a predictable drop in error rate, but at certain scales, the model might suddenly master a new skill it previously could not — such as understanding jokes, following complex instructions, or performing arithmetic.

Empirical studies have shown that, up to a point, the relationship between model scale and performance follows a smooth curve, often a power law. However, at certain thresholds, new behaviors seem to "emerge" almost as if the model underwent a phase transition — a sudden change in state, similar to how water turns to ice at 0°C. These emergent abilities are not just better performance, but qualitatively different behaviors that were not present in smaller models.

A simplified scaling law equation
expand arrow

A basic form of a scaling law can be written as:

L=aNα+bL = aN^{-\alpha} + b

where LL is the loss (error), NN is the number of model parameters (or data size), aa and bb are constants, and α\alpha is the scaling exponent. This equation predicts that as you increase NN, the loss LL decreases smoothly.

Emergent abilities and phase transitions
expand arrow

Emergent abilities do not arise gradually, but often appear abruptly once a model crosses a certain scale. This is analogous to a phase transition in physics, where a small change in temperature can suddenly turn water into ice. In models, once a critical threshold is passed, new qualitative behaviors — such as zero-shot reasoning or compositional generalization — can suddenly emerge.

Note
Note

Emergence is not simply more of the same. It refers to the appearance of qualitatively new behaviors that were not present, even in a weaker form, in smaller or less-trained models.

question mark

Which statement best summarizes the role of scaling laws in deep learning models?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2
some-alt