Mathematics of Scaled Dot-Product Attention
Desliza para mostrar el menú
Queries, Keys, and Values
Scaled dot-product attention operates on three vectors derived from each input token: a query (Q), a key (K), and a value (V). Each is produced by multiplying the input by a learned weight matrix.
Q– represents what the current token is looking for;K– represents what each token has to offer;V– holds the actual information to be aggregated.
During attention, queries are compared against keys to compute relevance scores. Those scores then determine how much of each value to include in the output.
The Formula
Attention(Q,K,V)=softmax(dkQK⊤)VEach step breaks down as follows:
- Dot product QKT – computes a raw score for how well each query matches each key;
- Scale by dk – prevents scores from growing large when the key dimension is high, which would push softmax into regions with very small gradients;
- Softmax – normalizes the scores into attention weights that sum to 1;
- Multiply by V – produces a weighted sum of value vectors, one output per query.
Implementation in PyTorch
1234567891011121314151617181920212223242526import torch import torch.nn.functional as F import math def scaled_dot_product_attention(Q, K, V): d_k = Q.size(-1) # Computing raw scores scores = Q @ K.transpose(-2, -1) / math.sqrt(d_k) # Converting scores to attention weights weights = F.softmax(scores, dim=-1) # Aggregating value vectors output = weights @ V return output, weights # Sequence of 4 tokens, each projected into dimension 8 Q = torch.rand(4, 8) K = torch.rand(4, 8) V = torch.rand(4, 8) output, weights = scaled_dot_product_attention(Q, K, V) print("Attention weights:\n", weights) print("Output shape:", output.shape)
Run this locally to observe how the attention weights distribute across tokens and how the output shape relates to the input.
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 1. Capítulo 2
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Sección 1. Capítulo 2