Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Learning Rate Scheduling Strategies | Section
Pre-training Large Language Models

bookLearning Rate Scheduling Strategies

Veeg om het menu te tonen

A fixed learning rate is rarely optimal for LLM training. Too high at the start causes unstable updates; too high at the end prevents convergence to a good minimum. Learning rate scheduling adjusts the rate dynamically throughout training.

Linear Warmup

Start with a near-zero learning rate and increase it linearly to the target value over the first warmup_steps steps. This gives the model time to settle into a reasonable parameter space before large gradient updates begin.

Cosine Decay

After warmup, decay the learning rate following a cosine curve – large updates early, fine-grained adjustments later. The rate approaches zero by the end of training. This is the most widely used schedule for LLM pre-training.

Implementation

123456789101112131415161718192021222324252627282930313233343536
import torch import torch.nn as nn import math from torch.optim import AdamW from torch.optim.lr_scheduler import LambdaLR model = nn.Linear(10, 10) optimizer = AdamW(model.parameters(), lr=2e-4) warmup_steps = 100 total_steps = 5000 def cosine_with_warmup(step): if step < warmup_steps: # Linear warmup return step / max(1, warmup_steps) # Cosine decay progress = (step - warmup_steps) / max(1, total_steps - warmup_steps) return 0.5 * (1.0 + math.cos(math.pi * progress)) scheduler = LambdaLR(optimizer, lr_lambda=cosine_with_warmup) # Simulating a training loop for step in range(total_steps): optimizer.zero_grad() # Forward and backward pass would go here loss = model(torch.randn(4, 10)).sum() loss.backward() optimizer.step() scheduler.step() if step % 1000 == 0: current_lr = scheduler.get_last_lr()[0] print(f"Step {step:05d} – lr: {current_lr:.6f}")
copy
question mark

Which of the following statements about learning rate schedules is correct?

Selecteer het correcte antwoord

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 8

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 8
some-alt