Learning Rate Scheduling
Learning rate scheduling is a powerful technique used to adjust the learning rate dynamically during the training of neural networks. The learning rate controls the size of the steps taken during optimization; if it is too high, training can become unstable and diverge, while a rate that is too low will make training slow and potentially get stuck in poor local minima. Scheduling the learning rate allows you to start with a relatively large value to speed up initial learning, then reduce it as training progresses to fine-tune the modelβs weights.
Mathematically, a scheduled learning rate can be expressed as a function of the current epoch or training step, such as:
lrtβ=lr0ββΞ³βt/nβwhere:
- lr0β is the initial learning rate;
- gamma is the decay factor;
- n is the number of epochs before each decay;
- t is the current epoch or step.
This approach helps the optimizer converge more smoothly and can prevent overshooting or oscillation near minima.
123456789101112131415161718192021222324252627282930import torch import torch.optim as optim import matplotlib.pyplot as plt # Dummy model parameters params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] # Optimizer with initial learning rate optimizer = optim.SGD(params, lr=0.1) # StepLR scheduler: decay LR by gamma=0.1 every 10 epochs scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) learning_rates = [] epochs = 25 for epoch in range(epochs): # Normally you would train here scheduler.step() # Record current learning rate current_lr = optimizer.param_groups[0]['lr'] learning_rates.append(current_lr) plt.figure(figsize=(8, 4)) plt.plot(range(1, epochs + 1), learning_rates, marker='o') plt.title('Learning Rate Schedule Over Epochs') plt.xlabel('Epoch') plt.ylabel('Learning Rate') plt.grid(True) plt.show()
The code sample above illustrates a step-based learning rate scheduler in PyTorch. Here, the learning rate begins at 0.1 and is reduced by a factor of 0.1 every 10 epochs. This means that for the first 10 epochs, the optimizer takes relatively large steps, allowing the model to quickly explore the loss surface. After each scheduled step, the learning rate drops, which leads to more cautious, fine-grained updates. This scheduling strategy can help the model converge more efficiently and avoid missing narrow minima due to overly aggressive updates. You will often see the loss decrease rapidly at first, then slow its descent as the learning rate drops, reflecting a transition from coarse to fine optimization. Using such schedules is especially useful in deeper or more complex models, where the dynamics of training can change significantly as the model approaches good solutions.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain other types of learning rate schedulers besides StepLR?
How do I choose the right values for step size and gamma in StepLR?
What are some best practices for using learning rate scheduling in deep learning?
Awesome!
Completion rate improved to 8.33
Learning Rate Scheduling
Swipe to show menu
Learning rate scheduling is a powerful technique used to adjust the learning rate dynamically during the training of neural networks. The learning rate controls the size of the steps taken during optimization; if it is too high, training can become unstable and diverge, while a rate that is too low will make training slow and potentially get stuck in poor local minima. Scheduling the learning rate allows you to start with a relatively large value to speed up initial learning, then reduce it as training progresses to fine-tune the modelβs weights.
Mathematically, a scheduled learning rate can be expressed as a function of the current epoch or training step, such as:
lrtβ=lr0ββΞ³βt/nβwhere:
- lr0β is the initial learning rate;
- gamma is the decay factor;
- n is the number of epochs before each decay;
- t is the current epoch or step.
This approach helps the optimizer converge more smoothly and can prevent overshooting or oscillation near minima.
123456789101112131415161718192021222324252627282930import torch import torch.optim as optim import matplotlib.pyplot as plt # Dummy model parameters params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] # Optimizer with initial learning rate optimizer = optim.SGD(params, lr=0.1) # StepLR scheduler: decay LR by gamma=0.1 every 10 epochs scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) learning_rates = [] epochs = 25 for epoch in range(epochs): # Normally you would train here scheduler.step() # Record current learning rate current_lr = optimizer.param_groups[0]['lr'] learning_rates.append(current_lr) plt.figure(figsize=(8, 4)) plt.plot(range(1, epochs + 1), learning_rates, marker='o') plt.title('Learning Rate Schedule Over Epochs') plt.xlabel('Epoch') plt.ylabel('Learning Rate') plt.grid(True) plt.show()
The code sample above illustrates a step-based learning rate scheduler in PyTorch. Here, the learning rate begins at 0.1 and is reduced by a factor of 0.1 every 10 epochs. This means that for the first 10 epochs, the optimizer takes relatively large steps, allowing the model to quickly explore the loss surface. After each scheduled step, the learning rate drops, which leads to more cautious, fine-grained updates. This scheduling strategy can help the model converge more efficiently and avoid missing narrow minima due to overly aggressive updates. You will often see the loss decrease rapidly at first, then slow its descent as the learning rate drops, reflecting a transition from coarse to fine optimization. Using such schedules is especially useful in deeper or more complex models, where the dynamics of training can change significantly as the model approaches good solutions.
Thanks for your feedback!