Lernen Scaling and Gradient Descent | Scaling and Model Performance

When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.

Note

Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.


              123456789101112131415161718192021222324252627282930
            
import numpy as np
import matplotlib.pyplot as plt

# Create a synthetic loss surface for two features
def loss_surface(w1, w2, scale_x=1, scale_y=10):
    return (scale_x * w1)**2 + (scale_y * w2)**2

w1 = np.linspace(-2, 2, 100)
w2 = np.linspace(-2, 2, 100)
W1, W2 = np.meshgrid(w1, w2)

# Unscaled (features have different variances)
Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10)
# Scaled (features have same variance)
Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis')
axes[0].set_title('Unscaled Features (Elongated Contours)')
axes[0].set_xlabel('Weight 1')
axes[0].set_ylabel('Weight 2')

axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis')
axes[1].set_title('Scaled Features (Circular Contours)')
axes[1].set_xlabel('Weight 1')
axes[1].set_ylabel('Weight 2')

plt.tight_layout()
plt.show()

War alles klar?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain why elongated contours slow down gradient descent?

What are some common methods for feature scaling?

How does feature scaling affect other optimization algorithms besides gradient descent?

Swipe um das Menü anzuzeigen

Note


              123456789101112131415161718192021222324252627282930
            
import numpy as np
import matplotlib.pyplot as plt

# Create a synthetic loss surface for two features
def loss_surface(w1, w2, scale_x=1, scale_y=10):
    return (scale_x * w1)**2 + (scale_y * w2)**2

w1 = np.linspace(-2, 2, 100)
w2 = np.linspace(-2, 2, 100)
W1, W2 = np.meshgrid(w1, w2)

# Unscaled (features have different variances)
Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10)
# Scaled (features have same variance)
Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis')
axes[0].set_title('Unscaled Features (Elongated Contours)')
axes[0].set_xlabel('Weight 1')
axes[0].set_ylabel('Weight 2')

axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis')
axes[1].set_title('Scaled Features (Circular Contours)')
axes[1].set_xlabel('Weight 1')
axes[1].set_ylabel('Weight 2')

plt.tight_layout()
plt.show()

War alles klar?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1