Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Emergence and Scaling Laws | Zero-Shot Generalization Foundations
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Zero-Shot and Few-Shot Generalization

bookEmergence and Scaling Laws

You will often hear that as models grow larger and are trained on more data, they not only improve at tasks they were already doing, but also begin to display entirely new abilities. This phenomenon is at the heart of scaling laws — empirical and theoretical relationships that describe how a model's performance changes as you increase its size (number of parameters), the amount of data it sees, or the compute used during training. In practice, researchers have observed that as you scale up these factors, models do not simply become incrementally better; instead, they sometimes exhibit sudden, qualitative jumps in capability. These jumps are called emergent abilities.

Scaling laws provide a way to predict how much improvement you might expect by increasing model size or dataset size. For instance, doubling the number of parameters might yield a predictable drop in error rate, but at certain scales, the model might suddenly master a new skill it previously could not — such as understanding jokes, following complex instructions, or performing arithmetic.

Empirical studies have shown that, up to a point, the relationship between model scale and performance follows a smooth curve, often a power law. However, at certain thresholds, new behaviors seem to "emerge" almost as if the model underwent a phase transition — a sudden change in state, similar to how water turns to ice at 0°C. These emergent abilities are not just better performance, but qualitatively different behaviors that were not present in smaller models.

A simplified scaling law equation
expand arrow

A basic form of a scaling law can be written as:

L=aNα+bL = aN^{-\alpha} + b

where LL is the loss (error), NN is the number of model parameters (or data size), aa and bb are constants, and α\alpha is the scaling exponent. This equation predicts that as you increase NN, the loss LL decreases smoothly.

Emergent abilities and phase transitions
expand arrow

Emergent abilities do not arise gradually, but often appear abruptly once a model crosses a certain scale. This is analogous to a phase transition in physics, where a small change in temperature can suddenly turn water into ice. In models, once a critical threshold is passed, new qualitative behaviors — such as zero-shot reasoning or compositional generalization — can suddenly emerge.

Note
Note

Emergence is not simply more of the same. It refers to the appearance of qualitatively new behaviors that were not present, even in a weaker form, in smaller or less-trained models.

question mark

Which statement best summarizes the role of scaling laws in deep learning models?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 2

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

bookEmergence and Scaling Laws

Свайпніть щоб показати меню

You will often hear that as models grow larger and are trained on more data, they not only improve at tasks they were already doing, but also begin to display entirely new abilities. This phenomenon is at the heart of scaling laws — empirical and theoretical relationships that describe how a model's performance changes as you increase its size (number of parameters), the amount of data it sees, or the compute used during training. In practice, researchers have observed that as you scale up these factors, models do not simply become incrementally better; instead, they sometimes exhibit sudden, qualitative jumps in capability. These jumps are called emergent abilities.

Scaling laws provide a way to predict how much improvement you might expect by increasing model size or dataset size. For instance, doubling the number of parameters might yield a predictable drop in error rate, but at certain scales, the model might suddenly master a new skill it previously could not — such as understanding jokes, following complex instructions, or performing arithmetic.

Empirical studies have shown that, up to a point, the relationship between model scale and performance follows a smooth curve, often a power law. However, at certain thresholds, new behaviors seem to "emerge" almost as if the model underwent a phase transition — a sudden change in state, similar to how water turns to ice at 0°C. These emergent abilities are not just better performance, but qualitatively different behaviors that were not present in smaller models.

A simplified scaling law equation
expand arrow

A basic form of a scaling law can be written as:

L=aNα+bL = aN^{-\alpha} + b

where LL is the loss (error), NN is the number of model parameters (or data size), aa and bb are constants, and α\alpha is the scaling exponent. This equation predicts that as you increase NN, the loss LL decreases smoothly.

Emergent abilities and phase transitions
expand arrow

Emergent abilities do not arise gradually, but often appear abruptly once a model crosses a certain scale. This is analogous to a phase transition in physics, where a small change in temperature can suddenly turn water into ice. In models, once a critical threshold is passed, new qualitative behaviors — such as zero-shot reasoning or compositional generalization — can suddenly emerge.

Note
Note

Emergence is not simply more of the same. It refers to the appearance of qualitatively new behaviors that were not present, even in a weaker form, in smaller or less-trained models.

question mark

Which statement best summarizes the role of scaling laws in deep learning models?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 2
some-alt