Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Understanding Positional Encoding | Section
Transformer Architecture

bookUnderstanding Positional Encoding

メニューを表示するにはスワイプしてください

Transformers process all tokens in a sequence simultaneously – unlike RNNs, they have no built-in notion of order. This means without additional information, the model cannot distinguish between "dog bites man" and "man bites dog".

Positional encoding solves this by adding a position-dependent vector to each token's embedding before it enters the transformer. The model then has access to both the token's meaning and its position in the sequence.

Sinusoidal Encoding

The original transformer uses sine and cosine functions of varying frequencies to construct a unique encoding for each position:

  • Even dimensions:
PE(pos,2i)=sin(pos100002idmodel)PE(pos, 2i) = sin \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)
  • Odd dimensions:
PE(pos,2i+1)=cos(pos100002idmodel)PE(pos, 2i+1) = cos \left( \frac{pos}{10000^{\frac{2i}{d_{model}}}} \right)

Different dimensions use different frequencies – lower dimensions oscillate quickly, higher dimensions change slowly. Together they form a unique fingerprint for each position that generalizes to sequence lengths not seen during training.

A Worked Example

For a sequence of length 3 with d_model = 4:

PositionPE(pos, 0)PE(pos, 1)PE(pos, 2)PE(pos, 3)
0sin(0) = 0.0cos(0) = 1.0sin(0) = 0.0cos(0) = 1.0
1sin(1) ≈ 0.841cos(1) ≈ 0.540sin(0.01) ≈ 0.010cos(0.01) ≈ 1.000
2sin(2) ≈ 0.909cos(2) ≈ −0.416sin(0.02) ≈ 0.020cos(0.02) ≈ 1.000

Each row is a unique vector added to the corresponding token embedding. Notice how columns 0–1 change rapidly while columns 2–3 change slowly – this multi-frequency structure is what makes each position distinguishable.

question mark

Which statement best describes how positional encoding is used in a transformer model?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  6

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  6
some-alt