Visual Intuition for Convergence
When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.
123456789101112131415161718192021222324252627282930313233343536373839404142434445import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
To improve convergence in difficult optimization landscapes, you can:
- Use momentum or adaptive optimizers to help escape flat regions or saddle points;
- Carefully tune the learning rate to avoid overshooting or getting stuck;
- Apply learning rate schedules that decrease step size as you approach minima;
- If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Can you explain what Himmelblau's function is and why it's used here?
How does the gradient descent trajectory illustrate the challenges of optimization on complex surfaces?
What would happen if we changed the learning rate or starting point in this example?
Awesome!
Completion rate improved to 5.56
Visual Intuition for Convergence
Deslize para mostrar o menu
When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.
123456789101112131415161718192021222324252627282930313233343536373839404142434445import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
To improve convergence in difficult optimization landscapes, you can:
- Use momentum or adaptive optimizers to help escape flat regions or saddle points;
- Carefully tune the learning rate to avoid overshooting or getting stuck;
- Apply learning rate schedules that decrease step size as you approach minima;
- If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
Obrigado pelo seu feedback!