single
Challenge: Structuring Encoder and Decoder Blocks
Swipe to show menu
Understanding the structure of encoder and decoder blocks is key to mastering how Transformers process and generate text. Each encoder block in a Transformer is designed to transform input sequences into context-rich representations, while each decoder block generates output sequences by attending to both previous outputs and the encoder’s representations. In sequence-to-sequence text tasks, such as translation or summarization, the encoder takes the input text and encodes it into a series of hidden states. The decoder then uses these hidden states, along with its own self-attention, to generate the target sequence step by step. This interaction between encoder and decoder blocks enables the model to capture complex dependencies in text, making Transformers highly effective for a wide range of natural language processing tasks.
The following table summarizes the sequence of operations in a Transformer encoder block and highlights their importance for text data:
| |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Each operation ensures that the encoder builds increasingly abstract and context-aware representations of the input text, which are essential for downstream sequence-to-sequence tasks. |
Swipe to start coding
Your task is to complete the missing parts of the TransformerEncoderBlock class by correctly initializing its key components and ensuring they are used in the forward computation.
The TransformerEncoderBlock class models a single encoder block in a Transformer architecture. You need to properly initialize two main components in the class constructor (__init__ method):
- The self-attention layer using
MultiHeadAttention(hidden_dim). This layer enables each token in the input to attend to all other tokens, capturing contextual relationships; - The feed-forward network using
FeedForward(hidden_dim). This layer applies non-linear transformations to each token representation individually, further processing the output from the self-attention layer; - In the
forwardmethod, you should apply these two layers sequentially:- first, pass the input
xthrough the self-attention layer; - then, pass the output from the self-attention layer through the feed-forward network;
- return the final output.
- first, pass the input
Make sure to instantiate both MultiHeadAttention and FeedForward with the hidden_dim parameter provided to the class. These components represent the core structure of an encoder block and must be applied in the correct order for the block to function properly.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat