Challenge: Implementing Scaled Dot-Product Attention
Swipe to show menu
Task
You now have all the pieces to implement scaled dot-product attention from scratch. Using the formula from the previous chapter, write a function scaled_dot_product_attention that:
- Takes
Q,K,Vtensors of shape(batch_size, seq_len, d_k)as input; - Accepts an optional
masktensor of shape(batch_size, seq_len_q, seq_len_k)— when provided, positions wheremask == 0should be set to-infbefore softmax; - Returns the output tensor and the attention weights.
Implement the function locally.
Everything was clear?
Thanks for your feedback!
Section 1. Chapter 3
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Section 1. Chapter 3