Aprende Autoregressive Generation Mechanism | How Transformers Generate Text

Desliza para mostrar el menú

Transformers generate text using an autoregressive generation mechanism. In this approach, the model predicts one token at a time in a sequence. Each time the model generates a token, it uses all previously generated tokens as input for predicting the next one. This creates a feedback loop: the output at each step becomes part of the context for the following prediction. The process continues until a special end-of-sequence token is produced or a maximum length is reached. This sequential prediction ensures that the generated text remains coherent, as each new token is chosen based on both the original input (if any) and the growing output sequence.

A key aspect of this process is the propagation of hidden states through the transformer layers. When predicting a token, the model transforms the input sequence into a set of hidden states — these are vectors representing the current context and meaning of each token so far. After generating a new token, the model updates its hidden states to include the effect of this token. This means that every new prediction is shaped by all previous tokens, as their representations have been woven into the hidden states. As the sequence grows, the influence of earlier tokens persists, allowing the transformer to maintain long-range dependencies and context throughout the generation process.

Definition

Representation flow is the process by which information about previous tokens is carried forward through the hidden states at each generation step. This flow is crucial for maintaining context and coherence, as it allows the model to "remember" what has already been generated and use that information when making subsequent predictions.

Autoregressive Generation

Predicts tokens one by one, each time conditioning on all previously generated tokens;
Maintains strong causal structure, preserving context and coherence;
Can be slower at inference because each token depends on the previous output;
Enables fine control over output, useful for tasks requiring step-by-step reasoning.

Non-Autoregressive Generation

Predicts all or many tokens in parallel, not strictly conditioning each token on the previous ones;
Can be much faster at inference, as predictions are parallelized;
May struggle with coherence and context, especially for long or complex sequences;
Often used in applications where speed is prioritized over accuracy or when the output structure is simpler.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 2. Capítulo 2