Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Scaling and Gradient Descent | Scaling and Model Performance
Feature Scaling and Normalization Deep Dive

bookScaling and Gradient Descent

When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.

Note
Note

Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.

123456789101112131415161718192021222324252627282930
import numpy as np import matplotlib.pyplot as plt # Create a synthetic loss surface for two features def loss_surface(w1, w2, scale_x=1, scale_y=10): return (scale_x * w1)**2 + (scale_y * w2)**2 w1 = np.linspace(-2, 2, 100) w2 = np.linspace(-2, 2, 100) W1, W2 = np.meshgrid(w1, w2) # Unscaled (features have different variances) Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10) # Scaled (features have same variance) Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1) fig, axes = plt.subplots(1, 2, figsize=(12, 5)) axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis') axes[0].set_title('Unscaled Features (Elongated Contours)') axes[0].set_xlabel('Weight 1') axes[0].set_ylabel('Weight 2') axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis') axes[1].set_title('Scaled Features (Circular Contours)') axes[1].set_xlabel('Weight 1') axes[1].set_ylabel('Weight 2') plt.tight_layout() plt.show()
copy
question mark

Which of the following best describes the effect of feature scaling on gradient descent optimization?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 5.26

bookScaling and Gradient Descent

Swipe um das Menü anzuzeigen

When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.

Note
Note

Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.

123456789101112131415161718192021222324252627282930
import numpy as np import matplotlib.pyplot as plt # Create a synthetic loss surface for two features def loss_surface(w1, w2, scale_x=1, scale_y=10): return (scale_x * w1)**2 + (scale_y * w2)**2 w1 = np.linspace(-2, 2, 100) w2 = np.linspace(-2, 2, 100) W1, W2 = np.meshgrid(w1, w2) # Unscaled (features have different variances) Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10) # Scaled (features have same variance) Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1) fig, axes = plt.subplots(1, 2, figsize=(12, 5)) axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis') axes[0].set_title('Unscaled Features (Elongated Contours)') axes[0].set_xlabel('Weight 1') axes[0].set_ylabel('Weight 2') axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis') axes[1].set_title('Scaled Features (Circular Contours)') axes[1].set_xlabel('Weight 1') axes[1].set_ylabel('Weight 2') plt.tight_layout() plt.show()
copy
question mark

Which of the following best describes the effect of feature scaling on gradient descent optimization?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 1
some-alt