Impara Visualizing Model Loss Landscapes | Scaling and Model Performance

When you train a machine learning model, the loss landscape represents how the model's loss changes as its parameters (such as weights) vary. You can picture this landscape as a surface with hills, valleys, and flat regions. The geometry of this surface is crucial: if the landscape is steep and well-shaped, optimization algorithms like gradient descent can quickly find the lowest point — the optimal parameters. However, if the landscape is stretched out, tilted, or warped, optimization becomes harder and convergence can be much slower or unstable. Scaling your features changes the shape of this landscape, often making it more symmetric and easier to navigate.

To build geometric intuition, imagine a simple linear regression with two features: if one feature has a much larger scale than the other, the loss surface will look like a narrow, elongated valley slanting along one axis. This makes it difficult for gradient descent to take efficient steps, as it zig-zags down the valley. When you scale the features so they have similar ranges or variances, the loss surface becomes more like a round bowl, allowing for direct, stable steps toward the minimum.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
            
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler

# Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 2)
# Unscaled: feature 1 in [0,1], feature 2 in [0,100]
X_unscaled = np.copy(X)
X_unscaled[:,1] *= 100
y = 3*X_unscaled[:,0] + 0.1*X_unscaled[:,1] + np.random.randn(100) * 2

# Define grid for weights
w1 = np.linspace(-1, 5, 50)
w2 = np.linspace(-1, 0.5, 50)
W1, W2 = np.meshgrid(w1, w2)

def compute_loss(X, y, w1, w2):
    loss = np.zeros_like(W1)
    for i in range(W1.shape[0]):
        for j in range(W1.shape[1]):
            y_pred = w1[j]*X[:,0] + w2[i]*X[:,1]
            loss[i,j] = np.mean((y - y_pred)**2)
    return loss

# Loss surface for unscaled data
loss_unscaled = compute_loss(X_unscaled, y, w1, w2)

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_unscaled)

# Loss surface for scaled data
loss_scaled = compute_loss(X_scaled, y, w1, w2)

fig = plt.figure(figsize=(14,6))

# Unscaled plot
ax1 = fig.add_subplot(1,2,1, projection='3d')
ax1.plot_surface(W1, W2, loss_unscaled, cmap='viridis', alpha=0.9)
ax1.set_title('Loss Surface (Unscaled Features)')
ax1.set_xlabel('Weight 1')
ax1.set_ylabel('Weight 2')
ax1.set_zlabel('MSE Loss')

# Scaled plot
ax2 = fig.add_subplot(1,2,2, projection='3d')
ax2.plot_surface(W1, W2, loss_scaled, cmap='plasma', alpha=0.9)
ax2.set_title('Loss Surface (Scaled Features)')
ax2.set_xlabel('Weight 1')
ax2.set_ylabel('Weight 2')
ax2.set_zlabel('MSE Loss')

plt.tight_layout()
plt.show()

Note

Scaling features transforms the loss landscape from a stretched, narrow valley into a more symmetric and well-conditioned surface. This reshaping lets optimization algorithms take more direct paths to the minimum, reducing the number of steps needed and making convergence more stable. In practice, this means your models can train faster and are less likely to get stuck or diverge due to poor conditioning.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 3

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain why scaling features changes the shape of the loss surface?

What would happen if I didn't scale my features in a real-world dataset?

Can you suggest other techniques to improve optimization besides feature scaling?

Scorri per mostrare il menu


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
            
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler

# Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 2)
# Unscaled: feature 1 in [0,1], feature 2 in [0,100]
X_unscaled = np.copy(X)
X_unscaled[:,1] *= 100
y = 3*X_unscaled[:,0] + 0.1*X_unscaled[:,1] + np.random.randn(100) * 2

# Define grid for weights
w1 = np.linspace(-1, 5, 50)
w2 = np.linspace(-1, 0.5, 50)
W1, W2 = np.meshgrid(w1, w2)

def compute_loss(X, y, w1, w2):
    loss = np.zeros_like(W1)
    for i in range(W1.shape[0]):
        for j in range(W1.shape[1]):
            y_pred = w1[j]*X[:,0] + w2[i]*X[:,1]
            loss[i,j] = np.mean((y - y_pred)**2)
    return loss

# Loss surface for unscaled data
loss_unscaled = compute_loss(X_unscaled, y, w1, w2)

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_unscaled)

# Loss surface for scaled data
loss_scaled = compute_loss(X_scaled, y, w1, w2)

fig = plt.figure(figsize=(14,6))

# Unscaled plot
ax1 = fig.add_subplot(1,2,1, projection='3d')
ax1.plot_surface(W1, W2, loss_unscaled, cmap='viridis', alpha=0.9)
ax1.set_title('Loss Surface (Unscaled Features)')
ax1.set_xlabel('Weight 1')
ax1.set_ylabel('Weight 2')
ax1.set_zlabel('MSE Loss')

# Scaled plot
ax2 = fig.add_subplot(1,2,2, projection='3d')
ax2.plot_surface(W1, W2, loss_scaled, cmap='plasma', alpha=0.9)
ax2.set_title('Loss Surface (Scaled Features)')
ax2.set_xlabel('Weight 1')
ax2.set_ylabel('Weight 2')
ax2.set_zlabel('MSE Loss')

plt.tight_layout()
plt.show()

Note

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 3