Apprendre Scaling and Gradient Descent | Scaling and Model Performance

Glissez pour afficher le menu

When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.

Note

Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.


              123456789101112131415161718192021222324252627282930
            
import numpy as np
import matplotlib.pyplot as plt

# Create a synthetic loss surface for two features
def loss_surface(w1, w2, scale_x=1, scale_y=10):
    return (scale_x * w1)**2 + (scale_y * w2)**2

w1 = np.linspace(-2, 2, 100)
w2 = np.linspace(-2, 2, 100)
W1, W2 = np.meshgrid(w1, w2)

# Unscaled (features have different variances)
Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10)
# Scaled (features have same variance)
Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis')
axes[0].set_title('Unscaled Features (Elongated Contours)')
axes[0].set_xlabel('Weight 1')
axes[0].set_ylabel('Weight 2')

axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis')
axes[1].set_title('Scaled Features (Circular Contours)')
axes[1].set_xlabel('Weight 1')
axes[1].set_ylabel('Weight 2')

plt.tight_layout()
plt.show()

Tout était clair ?

Merci pour vos commentaires !

Section 4. Chapitre 1

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 4. Chapitre 1