Learn L1 and L2 Regularization | Regularization Techniques

Swipe to show menu

L1 and L2 regularization are essential techniques to help control the complexity of neural network models and prevent overfitting. Both methods work by adding a penalty term to the loss function, discouraging the model from assigning excessively large values to its weights.

L1 regularization, also known as Lasso regularization, adds the sum of the absolute values of the weights to the loss function. Mathematically, if $w$ represents the model's weights and $λ$ is the regularization coefficient, the penalty term is:

\lambda \times \sum (|w|)

This approach encourages sparsity in the model weights, often driving some weights to exactly zero.

L2 regularization, or Ridge regularization, adds the sum of the squares of the weights to the loss function. Its penalty term is:

\lambda \times \sum (w^2)

Unlike L1, L2 regularization discourages large weights but does not necessarily drive them to zero, instead shrinking them towards smaller values.

Both forms of regularization modify the loss function as follows:

L1: $\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (|w|);$
L2: $\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (w^2).$


              123456789101112131415161718192021222324252627282930313233343536373839
            
import torch
import torch.nn as nn

# Simple fully connected model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

# Instantiate model and dummy data
model = SimpleNet()
x = torch.randn(5, 10)
y = torch.randn(5, 1)

# Standard MSE loss
criterion = nn.MSELoss()
output = model(x)
mse_loss = criterion(output, y)

# L1 and L2 regularization coefficients
lambda_l1 = 0.01
lambda_l2 = 0.01

# Calculate L1 regularization (sum of absolute weights)
l1_penalty = sum(torch.sum(torch.abs(param)) for param in model.parameters())

# Calculate L2 regularization (sum of squared weights)
l2_penalty = sum(torch.sum(param ** 2) for param in model.parameters())

# Total loss with L1 and L2 regularization
loss_l1 = mse_loss + lambda_l1 * l1_penalty
loss_l2 = mse_loss + lambda_l2 * l2_penalty

print("MSE loss:", mse_loss.item())
print("L1-regularized loss:", loss_l1.item())
print("L2-regularized loss:", loss_l2.item())

The impact of L1 and L2 regularization on a neural network is significant and distinct. L1 regularization, by penalizing the absolute values of weights, encourages many weights to become exactly zero, resulting in a sparse model. This makes L1 useful when you suspect that only a subset of features are truly important, as it effectively performs feature selection.

L2 regularization, on the other hand, penalizes the square of the weights, leading to smaller but nonzero weights. This tends to distribute the penalty across all weights, shrinking them but rarely eliminating them entirely. As a result, L2 regularization is beneficial when you want to reduce the influence of less important features without forcing them out of the model.

Choosing between L1 and L2 regularization depends on your specific goals:

Use L1 when you desire sparsity and interpretability;
Use L2 when you want to control overall weight magnitude and maintain contributions from all features.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 1