Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Understanding Positional Encoding | Section
Transformer Architecture

bookUnderstanding Positional Encoding

Desliza para mostrar el menú

Transformers process all tokens in a sequence simultaneously – unlike RNNs, they have no built-in notion of order. This means without additional information, the model cannot distinguish between "dog bites man" and "man bites dog".

Positional encoding solves this by adding a position-dependent vector to each token's embedding before it enters the transformer. The model then has access to both the token's meaning and its position in the sequence.

Sinusoidal Encoding

The original transformer uses sine and cosine functions of varying frequencies to construct a unique encoding for each position:

  • Even dimensions:
PE(pos,2i)=sin(pos100002idmodel)PE(pos, 2i) = sin \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)
  • Odd dimensions:
PE(pos,2i+1)=cos(pos100002idmodel)PE(pos, 2i+1) = cos \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)

Different dimensions use different frequencies – lower dimensions oscillate quickly, higher dimensions change slowly. Together they form a unique fingerprint for each position that generalizes to sequence lengths not seen during training.

A Worked Example

For a sequence of length 3 with d_model = 4:

PositionPE(pos, 0)PE(pos, 1)PE(pos, 2)PE(pos, 3)
0sin(0) = 0.0cos(0) = 1.0sin(0) = 0.0cos(0) = 1.0
1sin(1) ≈ 0.841cos(1) ≈ 0.540sin(0.01) ≈ 0.010cos(0.01) ≈ 1.000
2sin(2) ≈ 0.909cos(2) ≈ −0.416sin(0.02) ≈ 0.020cos(0.02) ≈ 1.000

Each row is a unique vector added to the corresponding token embedding. Notice how columns 0–1 change rapidly while columns 2–3 change slowly – this multi-frequency structure is what makes each position distinguishable.

question mark

Which statement best describes how positional encoding is used in a transformer model?

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 6

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 6
some-alt