Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Failure Modes and Generalization Limits | Limits, Transfer and Future Directions
Zero-Shot and Few-Shot Generalization

bookFailure Modes and Generalization Limits

When working with large language models (LLMs), you may expect them to generalize well to new tasks with zero or few examples. However, there are well-recognized theoretical and practical boundaries to this capability. Understanding why and where zero-shot and few-shot generalization break down is crucial for using LLMs effectively.

Ambiguity arises when a prompt or task description is open to multiple interpretations. LLMs rely on statistical associations from their training data, so when faced with ambiguous instructions, the model may choose an unintended interpretation, leading to unpredictable or incorrect outputs.

Hallucinations refer to the phenomenon where LLMs generate plausible-sounding but false or unsupported information. This occurs because LLMs do not have a mechanism to verify facts. They only generate text that statistically fits the prompt and context. As a result, in tasks requiring factual accuracy or external validation, LLMs may confidently produce incorrect statements.

Context saturation happens when the prompt or input context is too long or complex for the model to process effectively. LLMs have a finite context window, and when this limit is exceeded, important information may be truncated or ignored. This can lead to degraded performance, especially in tasks that require integrating information across a lengthy context.

Domain shift is another major failure mode. LLMs are trained on a wide variety of data, but when presented with tasks or data distributions that are significantly different from their training set, their performance can drop sharply. This is because the statistical patterns learned during training may not apply to the new domain, resulting in poor generalization.

Mathematical limits of prompt-based generalization
expand arrow

The performance of LLMs in zero-shot and few-shot settings is fundamentally limited by the statistical properties of their training data and model architecture. If a task requires reasoning patterns or knowledge not present in the training data, the model cannot invent solutions beyond its learned distributions. Additionally, the finite capacity of the model means that it cannot store or retrieve every possible combination of facts or rules.

Conceptual limits of prompt-based generalization
expand arrow

Even with carefully engineered prompts, LLMs cannot perform tasks that require explicit new knowledge, logical inference outside their training scope, or reasoning about truly novel concepts. Prompt-based generalization is constrained to what the model has implicitly learned; it cannot extrapolate beyond its conceptual boundaries without additional training or external tools.

Note
Note

Not all tasks are equally generalizable. Tasks that require memorization, precise calculation, or access to up-to-date information may need explicit training or integration of new knowledge sources, rather than relying solely on prompt-based generalization.

question mark

What best describes the failure mode known as hallucination in large language models?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 1

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

bookFailure Modes and Generalization Limits

Stryg for at vise menuen

When working with large language models (LLMs), you may expect them to generalize well to new tasks with zero or few examples. However, there are well-recognized theoretical and practical boundaries to this capability. Understanding why and where zero-shot and few-shot generalization break down is crucial for using LLMs effectively.

Ambiguity arises when a prompt or task description is open to multiple interpretations. LLMs rely on statistical associations from their training data, so when faced with ambiguous instructions, the model may choose an unintended interpretation, leading to unpredictable or incorrect outputs.

Hallucinations refer to the phenomenon where LLMs generate plausible-sounding but false or unsupported information. This occurs because LLMs do not have a mechanism to verify facts. They only generate text that statistically fits the prompt and context. As a result, in tasks requiring factual accuracy or external validation, LLMs may confidently produce incorrect statements.

Context saturation happens when the prompt or input context is too long or complex for the model to process effectively. LLMs have a finite context window, and when this limit is exceeded, important information may be truncated or ignored. This can lead to degraded performance, especially in tasks that require integrating information across a lengthy context.

Domain shift is another major failure mode. LLMs are trained on a wide variety of data, but when presented with tasks or data distributions that are significantly different from their training set, their performance can drop sharply. This is because the statistical patterns learned during training may not apply to the new domain, resulting in poor generalization.

Mathematical limits of prompt-based generalization
expand arrow

The performance of LLMs in zero-shot and few-shot settings is fundamentally limited by the statistical properties of their training data and model architecture. If a task requires reasoning patterns or knowledge not present in the training data, the model cannot invent solutions beyond its learned distributions. Additionally, the finite capacity of the model means that it cannot store or retrieve every possible combination of facts or rules.

Conceptual limits of prompt-based generalization
expand arrow

Even with carefully engineered prompts, LLMs cannot perform tasks that require explicit new knowledge, logical inference outside their training scope, or reasoning about truly novel concepts. Prompt-based generalization is constrained to what the model has implicitly learned; it cannot extrapolate beyond its conceptual boundaries without additional training or external tools.

Note
Note

Not all tasks are equally generalizable. Tasks that require memorization, precise calculation, or access to up-to-date information may need explicit training or integration of new knowledge sources, rather than relying solely on prompt-based generalization.

question mark

What best describes the failure mode known as hallucination in large language models?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 1
some-alt