Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge: Implementing Multi-Head Attention | Section
Transformer Architecture

bookChallenge: Implementing Multi-Head Attention

Pyyhkäise näyttääksesi valikon

Task

You have all the building blocks: scaled dot-product attention from the previous challenge, and the intuition behind multiple heads from the last chapter. Now put them together.

Implement a MultiHeadAttention module as an nn.Module class. It should:

  1. Accept d_model and num_heads as constructor arguments – assert that d_model % num_heads == 0;
  2. Define separate linear projections for Q, K, V, and a final output projection;
  3. In forward(x), split the projections into num_heads heads of dimension d_model // num_heads;
  4. Run scaled dot-product attention independently per head;
  5. Concatenate the head outputs and pass through the output projection.

Implement the module locally.

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 5

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 1. Luku 5
some-alt