Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Understanding Positional Encoding | Section
Transformer Architecture

bookUnderstanding Positional Encoding

Svep för att visa menyn

Transformers process all tokens in a sequence simultaneously – unlike RNNs, they have no built-in notion of order. This means without additional information, the model cannot distinguish between "dog bites man" and "man bites dog".

Positional encoding solves this by adding a position-dependent vector to each token's embedding before it enters the transformer. The model then has access to both the token's meaning and its position in the sequence.

Sinusoidal Encoding

The original transformer uses sine and cosine functions of varying frequencies to construct a unique encoding for each position:

  • Even dimensions:
PE(pos,2i)=sin(pos100002idmodel)PE(pos, 2i) = sin \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)
  • Odd dimensions:
PE(pos,2i+1)=cos(pos100002idmodel)PE(pos, 2i+1) = cos \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)

Different dimensions use different frequencies – lower dimensions oscillate quickly, higher dimensions change slowly. Together they form a unique fingerprint for each position that generalizes to sequence lengths not seen during training.

A Worked Example

For a sequence of length 3 with d_model = 4:

PositionPE(pos, 0)PE(pos, 1)PE(pos, 2)PE(pos, 3)
0sin(0) = 0.0cos(0) = 1.0sin(0) = 0.0cos(0) = 1.0
1sin(1) ≈ 0.841cos(1) ≈ 0.540sin(0.01) ≈ 0.010cos(0.01) ≈ 1.000
2sin(2) ≈ 0.909cos(2) ≈ −0.416sin(0.02) ≈ 0.020cos(0.02) ≈ 1.000

Each row is a unique vector added to the corresponding token embedding. Notice how columns 0–1 change rapidly while columns 2–3 change slowly – this multi-frequency structure is what makes each position distinguishable.

question mark

Which statement best describes how positional encoding is used in a transformer model?

Vänligen välj det korrekta svaret

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 6

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 1. Kapitel 6
some-alt