Learn Weight Decay in Practice | Regularization Techniques

Swipe to show menu

Understanding weight decay is essential for effectively regularizing neural networks. Weight decay is a technique that helps prevent overfitting by discouraging large weights in the model. It works by adding a penalty to the loss function, proportional to the sum of the squared values of the weights. This penalty term pushes the optimizer to find solutions with smaller weights, which can improve the model's ability to generalize to unseen data. In practice, weight decay is mathematically equivalent to L2 regularization. Both approaches add the same type of penalty term to the loss function, and many frameworks use the terms interchangeably. When you apply weight decay, you are effectively applying L2 regularization, encouraging the model to balance fitting the training data with keeping the weights small.


              1234567891011121314151617181920212223242526272829303132333435363738
            
import torch
import torch.nn as nn
import torch.optim as optim

# Create a simple model
torch.manual_seed(0)
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(20, 64)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(64, 1)
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = SimpleNet()

# Generate dummy data
X = torch.randn(1000, 20)
y = torch.randn(1000, 1)

# Define optimizer with weight decay (L2 regularization)
weight_decay = 1e-4
optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=weight_decay)
criterion = nn.MSELoss()

# Train the model
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

print("Final training loss:", loss.item())

When you apply weight decay as shown above, the optimizer considers both how well the model fits the data and how large the weights become. In the PyTorch code sample, setting the weight_decay parameter in the optimizer automatically adds L2 regularization to the loss function. This penalty helps keep the weights small, which is especially useful when training on limited or noisy data. As a result, the model is less likely to overfit and more likely to perform well on new, unseen examples. Adjusting the weight_decay value lets you balance between underfitting and overfitting: too high may cause underfitting, too low may lead to overfitting. Tuning this parameter helps you achieve better generalization and more reliable predictions.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 2