Scaling and Gradient Descent
When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.
Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.
123456789101112131415161718192021222324252627282930import numpy as np import matplotlib.pyplot as plt # Create a synthetic loss surface for two features def loss_surface(w1, w2, scale_x=1, scale_y=10): return (scale_x * w1)**2 + (scale_y * w2)**2 w1 = np.linspace(-2, 2, 100) w2 = np.linspace(-2, 2, 100) W1, W2 = np.meshgrid(w1, w2) # Unscaled (features have different variances) Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10) # Scaled (features have same variance) Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1) fig, axes = plt.subplots(1, 2, figsize=(12, 5)) axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis') axes[0].set_title('Unscaled Features (Elongated Contours)') axes[0].set_xlabel('Weight 1') axes[0].set_ylabel('Weight 2') axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis') axes[1].set_title('Scaled Features (Circular Contours)') axes[1].set_xlabel('Weight 1') axes[1].set_ylabel('Weight 2') plt.tight_layout() plt.show()
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Awesome!
Completion rate improved to 5.26
Scaling and Gradient Descent
Glissez pour afficher le menu
When you use gradient descent to optimize a machine learning model, the shape of the loss surface is crucial for determining how quickly and effectively the algorithm converges to a minimum. If your features are not scaled, those with larger ranges will dominate the loss function, causing the contours of the loss surface to become elongated and skewed. This distortion leads to inefficient optimization paths, where gradient descent zig-zags or takes tiny steps in some directions and much larger steps in others. As a result, convergence becomes much slower, and the optimizer may even get stuck or fail to reach the true minimum. Feature scaling, such as standardization or normalization, transforms the data so that all features contribute equally. This produces a more spherical loss surface, allowing gradient descent to move efficiently and directly toward the minimum.
Analogy: imagine hiking down a steep, narrow canyon (unscaled features) versus rolling down a smooth, round hill (scaled features). In the canyon, you must zig-zag and carefully pick your steps to avoid obstacles, making your journey slow and indirect. On the hill, you can move straight toward the bottom, reaching your goal much faster. Scaling features reshapes the optimization landscape from a canyon to a hill, making gradient descent more efficient.
123456789101112131415161718192021222324252627282930import numpy as np import matplotlib.pyplot as plt # Create a synthetic loss surface for two features def loss_surface(w1, w2, scale_x=1, scale_y=10): return (scale_x * w1)**2 + (scale_y * w2)**2 w1 = np.linspace(-2, 2, 100) w2 = np.linspace(-2, 2, 100) W1, W2 = np.meshgrid(w1, w2) # Unscaled (features have different variances) Z_unscaled = loss_surface(W1, W2, scale_x=1, scale_y=10) # Scaled (features have same variance) Z_scaled = loss_surface(W1, W2, scale_x=1, scale_y=1) fig, axes = plt.subplots(1, 2, figsize=(12, 5)) axes[0].contour(W1, W2, Z_unscaled, levels=20, cmap='viridis') axes[0].set_title('Unscaled Features (Elongated Contours)') axes[0].set_xlabel('Weight 1') axes[0].set_ylabel('Weight 2') axes[1].contour(W1, W2, Z_scaled, levels=20, cmap='viridis') axes[1].set_title('Scaled Features (Circular Contours)') axes[1].set_xlabel('Weight 1') axes[1].set_ylabel('Weight 2') plt.tight_layout() plt.show()
Merci pour vos commentaires !