Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Visualizing Model Loss Landscapes | Scaling and Model Performance
Feature Scaling and Normalization Deep Dive

bookVisualizing Model Loss Landscapes

When you train a machine learning model, the loss landscape represents how the model's loss changes as its parameters (such as weights) vary. You can picture this landscape as a surface with hills, valleys, and flat regions. The geometry of this surface is crucial: if the landscape is steep and well-shaped, optimization algorithms like gradient descent can quickly find the lowest point — the optimal parameters. However, if the landscape is stretched out, tilted, or warped, optimization becomes harder and convergence can be much slower or unstable. Scaling your features changes the shape of this landscape, often making it more symmetric and easier to navigate.

To build geometric intuition, imagine a simple linear regression with two features: if one feature has a much larger scale than the other, the loss surface will look like a narrow, elongated valley slanting along one axis. This makes it difficult for gradient descent to take efficient steps, as it zig-zags down the valley. When you scale the features so they have similar ranges or variances, the loss surface becomes more like a round bowl, allowing for direct, stable steps toward the minimum.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn.preprocessing import StandardScaler # Generate synthetic data np.random.seed(0) X = np.random.rand(100, 2) # Unscaled: feature 1 in [0,1], feature 2 in [0,100] X_unscaled = np.copy(X) X_unscaled[:,1] *= 100 y = 3*X_unscaled[:,0] + 0.1*X_unscaled[:,1] + np.random.randn(100) * 2 # Define grid for weights w1 = np.linspace(-1, 5, 50) w2 = np.linspace(-1, 0.5, 50) W1, W2 = np.meshgrid(w1, w2) def compute_loss(X, y, w1, w2): loss = np.zeros_like(W1) for i in range(W1.shape[0]): for j in range(W1.shape[1]): y_pred = w1[j]*X[:,0] + w2[i]*X[:,1] loss[i,j] = np.mean((y - y_pred)**2) return loss # Loss surface for unscaled data loss_unscaled = compute_loss(X_unscaled, y, w1, w2) # Scale features scaler = StandardScaler() X_scaled = scaler.fit_transform(X_unscaled) # Loss surface for scaled data loss_scaled = compute_loss(X_scaled, y, w1, w2) fig = plt.figure(figsize=(14,6)) # Unscaled plot ax1 = fig.add_subplot(1,2,1, projection='3d') ax1.plot_surface(W1, W2, loss_unscaled, cmap='viridis', alpha=0.9) ax1.set_title('Loss Surface (Unscaled Features)') ax1.set_xlabel('Weight 1') ax1.set_ylabel('Weight 2') ax1.set_zlabel('MSE Loss') # Scaled plot ax2 = fig.add_subplot(1,2,2, projection='3d') ax2.plot_surface(W1, W2, loss_scaled, cmap='plasma', alpha=0.9) ax2.set_title('Loss Surface (Scaled Features)') ax2.set_xlabel('Weight 1') ax2.set_ylabel('Weight 2') ax2.set_zlabel('MSE Loss') plt.tight_layout() plt.show()
copy
Note
Note

Scaling features transforms the loss landscape from a stretched, narrow valley into a more symmetric and well-conditioned surface. This reshaping lets optimization algorithms take more direct paths to the minimum, reducing the number of steps needed and making convergence more stable. In practice, this means your models can train faster and are less likely to get stuck or diverge due to poor conditioning.

question mark

Which statement best describes the effect of feature scaling on the loss landscape for gradient-based optimization?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 3

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Awesome!

Completion rate improved to 5.26

bookVisualizing Model Loss Landscapes

Glissez pour afficher le menu

When you train a machine learning model, the loss landscape represents how the model's loss changes as its parameters (such as weights) vary. You can picture this landscape as a surface with hills, valleys, and flat regions. The geometry of this surface is crucial: if the landscape is steep and well-shaped, optimization algorithms like gradient descent can quickly find the lowest point — the optimal parameters. However, if the landscape is stretched out, tilted, or warped, optimization becomes harder and convergence can be much slower or unstable. Scaling your features changes the shape of this landscape, often making it more symmetric and easier to navigate.

To build geometric intuition, imagine a simple linear regression with two features: if one feature has a much larger scale than the other, the loss surface will look like a narrow, elongated valley slanting along one axis. This makes it difficult for gradient descent to take efficient steps, as it zig-zags down the valley. When you scale the features so they have similar ranges or variances, the loss surface becomes more like a round bowl, allowing for direct, stable steps toward the minimum.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn.preprocessing import StandardScaler # Generate synthetic data np.random.seed(0) X = np.random.rand(100, 2) # Unscaled: feature 1 in [0,1], feature 2 in [0,100] X_unscaled = np.copy(X) X_unscaled[:,1] *= 100 y = 3*X_unscaled[:,0] + 0.1*X_unscaled[:,1] + np.random.randn(100) * 2 # Define grid for weights w1 = np.linspace(-1, 5, 50) w2 = np.linspace(-1, 0.5, 50) W1, W2 = np.meshgrid(w1, w2) def compute_loss(X, y, w1, w2): loss = np.zeros_like(W1) for i in range(W1.shape[0]): for j in range(W1.shape[1]): y_pred = w1[j]*X[:,0] + w2[i]*X[:,1] loss[i,j] = np.mean((y - y_pred)**2) return loss # Loss surface for unscaled data loss_unscaled = compute_loss(X_unscaled, y, w1, w2) # Scale features scaler = StandardScaler() X_scaled = scaler.fit_transform(X_unscaled) # Loss surface for scaled data loss_scaled = compute_loss(X_scaled, y, w1, w2) fig = plt.figure(figsize=(14,6)) # Unscaled plot ax1 = fig.add_subplot(1,2,1, projection='3d') ax1.plot_surface(W1, W2, loss_unscaled, cmap='viridis', alpha=0.9) ax1.set_title('Loss Surface (Unscaled Features)') ax1.set_xlabel('Weight 1') ax1.set_ylabel('Weight 2') ax1.set_zlabel('MSE Loss') # Scaled plot ax2 = fig.add_subplot(1,2,2, projection='3d') ax2.plot_surface(W1, W2, loss_scaled, cmap='plasma', alpha=0.9) ax2.set_title('Loss Surface (Scaled Features)') ax2.set_xlabel('Weight 1') ax2.set_ylabel('Weight 2') ax2.set_zlabel('MSE Loss') plt.tight_layout() plt.show()
copy
Note
Note

Scaling features transforms the loss landscape from a stretched, narrow valley into a more symmetric and well-conditioned surface. This reshaping lets optimization algorithms take more direct paths to the minimum, reducing the number of steps needed and making convergence more stable. In practice, this means your models can train faster and are less likely to get stuck or diverge due to poor conditioning.

question mark

Which statement best describes the effect of feature scaling on the loss landscape for gradient-based optimization?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 3
some-alt