Learn Batch Normalization and Early Stopping

Swipe to show menu

Batch normalization and early stopping are two powerful techniques that support regularization and stable training in neural networks.

Batch Normalization

Batch normalization addresses internal covariate shift by normalizing the activations of each layer to have a stable mean and variance, typically across a mini-batch;
This normalization helps gradients flow more smoothly through the network, allowing for higher learning rates and faster convergence;
Batch normalization also acts as a regularizer, sometimes reducing the need for other techniques such as dropout.

Early Stopping

Early stopping is a form of regularization that halts training when the model's performance on a validation set stops improving;
By monitoring validation loss, early stopping prevents overfitting, ensuring that the model does not continue to optimize excessively on the training data at the expense of generalization.

Together, these methods enhance both the stability and generalization ability of neural networks.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
            
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Data generation
X = np.random.randn(1000, 20).astype(np.float32)
y = np.random.randint(0, 2, size=1000).astype(np.int64)

# Train/validation split
split = int(0.8 * len(X))
X_train, X_val = X[:split], X[split:]
y_train, y_val = y[:split], y[split:]

# Convert to torch tensors
torch_X_train = torch.from_numpy(X_train)
torch_y_train = torch.from_numpy(y_train)
torch_X_val = torch.from_numpy(X_val)
torch_y_val = torch.from_numpy(y_val)

# Model with batch normalization
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(20, 64),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.BatchNorm1d(32),
            nn.ReLU(),
            nn.Linear(32, 2)
        )
    def forward(self, x):
        return self.layers(x)

model = SimpleNet()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Early stopping setup
best_loss = float('inf')
patience = 5
wait = 0
best_state = None

for epoch in range(50):
    # Training
    model.train()
    optimizer.zero_grad()
    out = model(torch_X_train)
    loss = loss_fn(out, torch_y_train)
    loss.backward()
    optimizer.step()
    # Validation
    model.eval()
    with torch.no_grad():
        val_out = model(torch_X_val)
        val_loss = loss_fn(val_out, torch_y_val).item()
    if val_loss < best_loss:
        best_loss = val_loss
        best_state = model.state_dict()
        wait = 0
    else:
        wait += 1
        if wait >= patience:
            break

if best_state:
    model.load_state_dict(best_state)
print("Training stopped after", epoch + 1, "epochs.")
print("Best validation loss:", round(best_loss, 6))

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 4