Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Learning Rate Scheduling Strategies | Section
Pre-training Large Language Models

bookLearning Rate Scheduling Strategies

Swipe um das Menü anzuzeigen

A fixed learning rate is rarely optimal for LLM training. Too high at the start causes unstable updates; too high at the end prevents convergence to a good minimum. Learning rate scheduling adjusts the rate dynamically throughout training.

Linear Warmup

Start with a near-zero learning rate and increase it linearly to the target value over the first warmup_steps steps. This gives the model time to settle into a reasonable parameter space before large gradient updates begin.

Cosine Decay

After warmup, decay the learning rate following a cosine curve – large updates early, fine-grained adjustments later. The rate approaches zero by the end of training. This is the most widely used schedule for LLM pre-training.

Implementation

123456789101112131415161718192021222324252627282930313233343536
import torch import torch.nn as nn import math from torch.optim import AdamW from torch.optim.lr_scheduler import LambdaLR model = nn.Linear(10, 10) optimizer = AdamW(model.parameters(), lr=2e-4) warmup_steps = 100 total_steps = 5000 def cosine_with_warmup(step): if step < warmup_steps: # Linear warmup return step / max(1, warmup_steps) # Cosine decay progress = (step - warmup_steps) / max(1, total_steps - warmup_steps) return 0.5 * (1.0 + math.cos(math.pi * progress)) scheduler = LambdaLR(optimizer, lr_lambda=cosine_with_warmup) # Simulating a training loop for step in range(total_steps): optimizer.zero_grad() # Forward and backward pass would go here loss = model(torch.randn(4, 10)).sum() loss.backward() optimizer.step() scheduler.step() if step % 1000 == 0: current_lr = scheduler.get_last_lr()[0] print(f"Step {step:05d} – lr: {current_lr:.6f}")
copy
question mark

Which of the following statements about learning rate schedules is correct?

Wählen Sie die richtige Antwort aus

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 8

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Abschnitt 1. Kapitel 8
some-alt