Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn L1 and L2 Regularization | Regularization Techniques
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Optimization and Regularization in Neural Networks with Python

bookL1 and L2 Regularization

L1 and L2 regularization are essential techniques to help control the complexity of neural network models and prevent overfitting. Both methods work by adding a penalty term to the loss function, discouraging the model from assigning excessively large values to its weights.

L1 regularization, also known as Lasso regularization, adds the sum of the absolute values of the weights to the loss function. Mathematically, if ww represents the model's weights and λλ is the regularization coefficient, the penalty term is:

Ξ»Γ—βˆ‘(∣w∣)\lambda \times \sum (|w|)

This approach encourages sparsity in the model weights, often driving some weights to exactly zero.

L2 regularization, or Ridge regularization, adds the sum of the squares of the weights to the loss function. Its penalty term is:

Ξ»Γ—βˆ‘(w2)\lambda \times \sum (w^2)

Unlike L1, L2 regularization discourages large weights but does not necessarily drive them to zero, instead shrinking them towards smaller values.

Both forms of regularization modify the loss function as follows:

  • L1: Loss=Original_Loss+Ξ»Γ—βˆ‘(∣w∣);\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (|w|);
  • L2: Loss=Original_Loss+Ξ»Γ—βˆ‘(w2).\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (w^2).
123456789101112131415161718192021222324252627282930313233343536373839
import torch import torch.nn as nn # Simple fully connected model class SimpleNet(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(10, 1) def forward(self, x): return self.fc(x) # Instantiate model and dummy data model = SimpleNet() x = torch.randn(5, 10) y = torch.randn(5, 1) # Standard MSE loss criterion = nn.MSELoss() output = model(x) mse_loss = criterion(output, y) # L1 and L2 regularization coefficients lambda_l1 = 0.01 lambda_l2 = 0.01 # Calculate L1 regularization (sum of absolute weights) l1_penalty = sum(torch.sum(torch.abs(param)) for param in model.parameters()) # Calculate L2 regularization (sum of squared weights) l2_penalty = sum(torch.sum(param ** 2) for param in model.parameters()) # Total loss with L1 and L2 regularization loss_l1 = mse_loss + lambda_l1 * l1_penalty loss_l2 = mse_loss + lambda_l2 * l2_penalty print("MSE loss:", mse_loss.item()) print("L1-regularized loss:", loss_l1.item()) print("L2-regularized loss:", loss_l2.item())
copy

The impact of L1 and L2 regularization on a neural network is significant and distinct. L1 regularization, by penalizing the absolute values of weights, encourages many weights to become exactly zero, resulting in a sparse model. This makes L1 useful when you suspect that only a subset of features are truly important, as it effectively performs feature selection.

L2 regularization, on the other hand, penalizes the square of the weights, leading to smaller but nonzero weights. This tends to distribute the penalty across all weights, shrinking them but rarely eliminating them entirely. As a result, L2 regularization is beneficial when you want to reduce the influence of less important features without forcing them out of the model.

Choosing between L1 and L2 regularization depends on your specific goals:

  • Use L1 when you desire sparsity and interpretability;
  • Use L2 when you want to control overall weight magnitude and maintain contributions from all features.
question mark

When should you prefer L1 regularization over L2 regularization, and vice versa?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookL1 and L2 Regularization

Swipe to show menu

L1 and L2 regularization are essential techniques to help control the complexity of neural network models and prevent overfitting. Both methods work by adding a penalty term to the loss function, discouraging the model from assigning excessively large values to its weights.

L1 regularization, also known as Lasso regularization, adds the sum of the absolute values of the weights to the loss function. Mathematically, if ww represents the model's weights and λλ is the regularization coefficient, the penalty term is:

Ξ»Γ—βˆ‘(∣w∣)\lambda \times \sum (|w|)

This approach encourages sparsity in the model weights, often driving some weights to exactly zero.

L2 regularization, or Ridge regularization, adds the sum of the squares of the weights to the loss function. Its penalty term is:

Ξ»Γ—βˆ‘(w2)\lambda \times \sum (w^2)

Unlike L1, L2 regularization discourages large weights but does not necessarily drive them to zero, instead shrinking them towards smaller values.

Both forms of regularization modify the loss function as follows:

  • L1: Loss=Original_Loss+Ξ»Γ—βˆ‘(∣w∣);\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (|w|);
  • L2: Loss=Original_Loss+Ξ»Γ—βˆ‘(w2).\text{Loss} = \text{Original\_Loss} + \lambda \times \sum (w^2).
123456789101112131415161718192021222324252627282930313233343536373839
import torch import torch.nn as nn # Simple fully connected model class SimpleNet(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(10, 1) def forward(self, x): return self.fc(x) # Instantiate model and dummy data model = SimpleNet() x = torch.randn(5, 10) y = torch.randn(5, 1) # Standard MSE loss criterion = nn.MSELoss() output = model(x) mse_loss = criterion(output, y) # L1 and L2 regularization coefficients lambda_l1 = 0.01 lambda_l2 = 0.01 # Calculate L1 regularization (sum of absolute weights) l1_penalty = sum(torch.sum(torch.abs(param)) for param in model.parameters()) # Calculate L2 regularization (sum of squared weights) l2_penalty = sum(torch.sum(param ** 2) for param in model.parameters()) # Total loss with L1 and L2 regularization loss_l1 = mse_loss + lambda_l1 * l1_penalty loss_l2 = mse_loss + lambda_l2 * l2_penalty print("MSE loss:", mse_loss.item()) print("L1-regularized loss:", loss_l1.item()) print("L2-regularized loss:", loss_l2.item())
copy

The impact of L1 and L2 regularization on a neural network is significant and distinct. L1 regularization, by penalizing the absolute values of weights, encourages many weights to become exactly zero, resulting in a sparse model. This makes L1 useful when you suspect that only a subset of features are truly important, as it effectively performs feature selection.

L2 regularization, on the other hand, penalizes the square of the weights, leading to smaller but nonzero weights. This tends to distribute the penalty across all weights, shrinking them but rarely eliminating them entirely. As a result, L2 regularization is beneficial when you want to reduce the influence of less important features without forcing them out of the model.

Choosing between L1 and L2 regularization depends on your specific goals:

  • Use L1 when you desire sparsity and interpretability;
  • Use L2 when you want to control overall weight magnitude and maintain contributions from all features.
question mark

When should you prefer L1 regularization over L2 regularization, and vice versa?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1
some-alt