Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Understanding Positional Encoding | Section
Transformer Architecture

bookUnderstanding Positional Encoding

Stryg for at vise menuen

Transformers process all tokens in a sequence simultaneously – unlike RNNs, they have no built-in notion of order. This means without additional information, the model cannot distinguish between "dog bites man" and "man bites dog".

Positional encoding solves this by adding a position-dependent vector to each token's embedding before it enters the transformer. The model then has access to both the token's meaning and its position in the sequence.

Sinusoidal Encoding

The original transformer uses sine and cosine functions of varying frequencies to construct a unique encoding for each position:

  • Even dimensions:
PE(pos,2i)=sin(pos100002idmodel)PE(pos, 2i) = sin \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)
  • Odd dimensions:
PE(pos,2i+1)=cos(pos100002idmodel)PE(pos, 2i+1) = cos \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)

Different dimensions use different frequencies – lower dimensions oscillate quickly, higher dimensions change slowly. Together they form a unique fingerprint for each position that generalizes to sequence lengths not seen during training.

A Worked Example

For a sequence of length 3 with d_model = 4:

PositionPE(pos, 0)PE(pos, 1)PE(pos, 2)PE(pos, 3)
0sin(0) = 0.0cos(0) = 1.0sin(0) = 0.0cos(0) = 1.0
1sin(1) ≈ 0.841cos(1) ≈ 0.540sin(0.01) ≈ 0.010cos(0.01) ≈ 1.000
2sin(2) ≈ 0.909cos(2) ≈ −0.416sin(0.02) ≈ 0.020cos(0.02) ≈ 1.000

Each row is a unique vector added to the corresponding token embedding. Notice how columns 0–1 change rapidly while columns 2–3 change slowly – this multi-frequency structure is what makes each position distinguishable.

question mark

Which statement best describes how positional encoding is used in a transformer model?

Vælg det korrekte svar

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 6

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 1. Kapitel 6
some-alt