Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Structuring Encoder and Decoder Blocks | Building Transformer Components
Transformers for Natural Language Processing
Section 2. Chapter 5
single

single

bookChallenge: Structuring Encoder and Decoder Blocks

Swipe to show menu

Understanding the structure of encoder and decoder blocks is key to mastering how Transformers process and generate text. Each encoder block in a Transformer is designed to transform input sequences into context-rich representations, while each decoder block generates output sequences by attending to both previous outputs and the encoder’s representations. In sequence-to-sequence text tasks, such as translation or summarization, the encoder takes the input text and encodes it into a series of hidden states. The decoder then uses these hidden states, along with its own self-attention, to generate the target sequence step by step. This interaction between encoder and decoder blocks enables the model to capture complex dependencies in text, making Transformers highly effective for a wide range of natural language processing tasks.

The following table summarizes the sequence of operations in a Transformer encoder block and highlights their importance for text data:

StepOperationPurpose for Text Data
1Multi-head self-attentionCaptures relationships between all tokens in the input.
2Add & NormalizeStabilizes training and preserves information.
3Feed-forward networkApplies non-linear transformations to each token.
4Add & NormalizeFurther stabilizes and enables deep stacking.

Each operation ensures that the encoder builds increasingly abstract and context-aware representations of the input text, which are essential for downstream sequence-to-sequence tasks.

Task

Swipe to start coding

Your task is to complete the missing parts of the TransformerEncoderBlock class by correctly initializing its key components and ensuring they are used in the forward computation.

The TransformerEncoderBlock class models a single encoder block in a Transformer architecture. You need to properly initialize two main components in the class constructor (__init__ method):

  • The self-attention layer using MultiHeadAttention(hidden_dim). This layer enables each token in the input to attend to all other tokens, capturing contextual relationships;
  • The feed-forward network using FeedForward(hidden_dim). This layer applies non-linear transformations to each token representation individually, further processing the output from the self-attention layer;
  • In the forward method, you should apply these two layers sequentially:
    • first, pass the input x through the self-attention layer;
    • then, pass the output from the self-attention layer through the feed-forward network;
    • return the final output.

Make sure to instantiate both MultiHeadAttention and FeedForward with the hidden_dim parameter provided to the class. These components represent the core structure of an encoder block and must be applied in the correct order for the block to function properly.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 5
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt