Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Implementing the Pre-training Loop in PyTorch | Section
Pre-training Large Language Models

bookImplementing the Pre-training Loop in PyTorch

Svep för att visa menyn

The pre-training loop is the core of language model training. Each iteration follows the same sequence: load a batch, compute the loss, backpropagate, update weights.

The Training Loop

123456789101112131415161718192021222324252627282930313233343536373839404142
import torch import torch.nn as nn import torch.optim as optim class SimpleLanguageModel(nn.Module): def __init__(self, vocab_size, d_model): super().__init__() self.embedding = nn.Embedding(vocab_size, d_model) self.linear = nn.Linear(d_model, vocab_size) def forward(self, x): return self.linear(self.embedding(x)) vocab_size = 1000 d_model = 128 batch_size = 16 seq_len = 20 model = SimpleLanguageModel(vocab_size, d_model) optimizer = optim.Adam(model.parameters(), lr=1e-3) criterion = nn.CrossEntropyLoss() # Dummy batch: input tokens and their shifted targets inputs = torch.randint(0, vocab_size, (batch_size, seq_len)) targets = torch.randint(0, vocab_size, (batch_size, seq_len)) model.train() for epoch in range(20): optimizer.zero_grad() # 1. Clear gradients logits = model(inputs) # 2. Forward pass loss = criterion( # 3. Compute CLM loss logits.view(-1, vocab_size), targets.view(-1) ) loss.backward() # 4. Backpropagate optimizer.step() # 5. Update weights print(f"Epoch {epoch + 1:02d} – loss: {loss.item():.4f}")
copy

Each step has a fixed role. Calling optimizer.zero_grad() first is essential – without it, gradients from the previous batch accumulate and corrupt the update.

Note that in a real pre-training setup, targets is inputs shifted by one position to the right – the CLM objective from the previous chapter. The dummy random targets here are only for demonstrating the loop structure.

Run this locally and watch the loss decrease over 20 epochs. Then replace SimpleLanguageModel with the transformer you built in Course 2 to see the full pipeline in action.

question mark

Which of the following correctly lists the main steps in each iteration of a PyTorch training loop for language modeling?

Vänligen välj det korrekta svaret

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 6

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 1. Kapitel 6
some-alt