# Task

Build a transformer from scratch and train it on a synthetic sequence-to-sequence task: **English-to-reversed-English translation**. The input is a sequence of random lowercase letters (e.g., `"hello"`), and the target is its reverse (e.g., `"olleh"`).

Use only the components you implemented in previous chapters — scaled dot-product attention, multi-head attention, positional encoding, encoder and decoder blocks, layer normalization, and feed-forward sublayers. Do not use external transformer implementations.

Your implementation should:

1. Generate a synthetic dataset of random lowercase strings and their reverses;
2. Tokenize strings at the character level and build a vocabulary;
3. Assemble a full encoder-decoder transformer from your own components;
4. Implement a training loop with cross-entropy loss;
5. Evaluate **sequence-level accuracy** on a held-out test set — the percentage of inputs where the predicted output exactly matches the reversed string.

Once your model trains successfully, experiment with the following:

- Number of encoder and decoder layers;
- Number of attention heads;
- `d_model` and `d_ff` values;
- Sequence length and dataset size;
- Learning rate and number of training epochs.

Observe how each change affects accuracy and training stability. Note any interesting behaviors – for example, at what point does the model start generalizing, and what happens when you increase sequence length?

Master the transformer architecture by building its core components from scratch. Explore self-attention, multi-head attention, positional encoding, encoder-decoder structure, and layer normalization, all within a consistent domain. Finish by assembling a complete transformer and tackling a capstone challenge.

Build a deep understanding of the transformer architecture by constructing each component from scratch, culminating in a capstone challenge.

Challenge: Build and Test Your Transformer

Task