Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Implementing Scaled Dot-Product Attention | Section
Transformer Architecture

bookChallenge: Implementing Scaled Dot-Product Attention

Sveip for å vise menyen

Task

You now have all the pieces to implement scaled dot-product attention from scratch. Using the formula from the previous chapter, write a function scaled_dot_product_attention that:

  1. Takes Q, K, V tensors of shape (batch_size, seq_len, d_k) as input;
  2. Accepts an optional mask tensor of shape (batch_size, seq_len_q, seq_len_k) — when provided, positions where mask == 0 should be set to -inf before softmax;
  3. Returns the output tensor and the attention weights.

Implement the function locally.

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 3

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 3
some-alt