Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Learning Rate Scheduling Strategies | Section
Pre-training Large Language Models

bookLearning Rate Scheduling Strategies

Glissez pour afficher le menu

A fixed learning rate is rarely optimal for LLM training. Too high at the start causes unstable updates; too high at the end prevents convergence to a good minimum. Learning rate scheduling adjusts the rate dynamically throughout training.

Linear Warmup

Start with a near-zero learning rate and increase it linearly to the target value over the first warmup_steps steps. This gives the model time to settle into a reasonable parameter space before large gradient updates begin.

Cosine Decay

After warmup, decay the learning rate following a cosine curve – large updates early, fine-grained adjustments later. The rate approaches zero by the end of training. This is the most widely used schedule for LLM pre-training.

Implementation

123456789101112131415161718192021222324252627282930313233343536
import torch import torch.nn as nn import math from torch.optim import AdamW from torch.optim.lr_scheduler import LambdaLR model = nn.Linear(10, 10) optimizer = AdamW(model.parameters(), lr=2e-4) warmup_steps = 100 total_steps = 5000 def cosine_with_warmup(step): if step < warmup_steps: # Linear warmup return step / max(1, warmup_steps) # Cosine decay progress = (step - warmup_steps) / max(1, total_steps - warmup_steps) return 0.5 * (1.0 + math.cos(math.pi * progress)) scheduler = LambdaLR(optimizer, lr_lambda=cosine_with_warmup) # Simulating a training loop for step in range(total_steps): optimizer.zero_grad() # Forward and backward pass would go here loss = model(torch.randn(4, 10)).sum() loss.backward() optimizer.step() scheduler.step() if step % 1000 == 0: current_lr = scheduler.get_last_lr()[0] print(f"Step {step:05d} – lr: {current_lr:.6f}")
copy
question mark

Which of the following statements about learning rate schedules is correct?

Sélectionnez la réponse correcte

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 8

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 1. Chapitre 8
some-alt