Summary  
This chapter explains how to implement dynamic learning rate scheduling to adjust an optimizer’s step size over training epochs, enabling smoother and more efficient convergence.

General domain of usage  
Neural network training

Learning rate scheduling is a **powerful technique** used to adjust the learning rate dynamically during the training of neural networks. The **learning rate** controls the size of the steps taken during optimization; if it is too high, training can become unstable and diverge, while a rate that is too low will make training slow and potentially get stuck in poor local minima. Scheduling the learning rate allows you to start with a relatively large value to speed up initial learning, then reduce it as training progresses to fine-tune the model’s weights. 

Mathematically, a scheduled learning rate can be expressed as a function of the current epoch or training step, such as:

$$
lr_t = lr_0 * \gamma^{\lfloor t / n \rfloor }
$$


where:
- $$lr_0$$ is the initial learning rate;
- $$gamma$$ is the decay factor;
- $$n$$ is the number of epochs before each decay;
- $$t$$ is the current epoch or step.

This approach helps the optimizer **converge more smoothly** and can prevent overshooting or oscillation near minima.

import torch
import torch.optim as optim
import matplotlib.pyplot as plt

# Dummy model parameters
params = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]

# Optimizer with initial learning rate
optimizer = optim.SGD(params, lr=0.1)

# StepLR scheduler: decay LR by gamma=0.1 every 10 epochs
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

learning_rates = []
epochs = 25

for epoch in range(epochs):
    # Normally you would train here
    scheduler.step()
    # Record current learning rate
    current_lr = optimizer.param_groups[0]['lr']
    learning_rates.append(current_lr)

plt.figure(figsize=(8, 4))
plt.plot(range(1, epochs + 1), learning_rates, marker='o')
plt.title('Learning Rate Schedule Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.grid(True)
plt.show()

The code sample above illustrates a **step-based learning rate scheduler** in `PyTorch`. Here, the learning rate begins at `0.1` and is reduced by a factor of `0.1` every 10 epochs. This means that for the first 10 epochs, the optimizer takes relatively large steps, allowing the model to quickly explore the loss surface. After each scheduled step, the learning rate drops, which leads to more cautious, fine-grained updates. This scheduling strategy can help the model **converge more efficiently** and avoid missing narrow minima due to overly aggressive updates. You will often see the loss decrease rapidly at first, then slow its descent as the learning rate drops, reflecting a transition from coarse to fine optimization. Using such schedules is especially useful in deeper or more complex models, where the dynamics of training can change significantly as the model approaches good solutions.

Why might you want to decrease the learning rate during neural network training?

Master the mathematical and practical foundations of neural network optimization, explore advanced regularization techniques, and gain hands-on experience with PyTorch and TensorFlow for robust model training.

Explore the mathematical underpinnings of neural network optimization, including gradients, loss surfaces, and the challenges of vanishing and exploding gradients.

Compare and implement advanced optimization algorithms and learning rate scheduling in neural network training.

Master the theory and application of regularization methods to prevent overfitting in neural networks.

Learning Rate Scheduling