Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Visual Intuition for Convergence | Convergence and Theoretical Guarantees
Mathematics of Optimization in ML

bookVisual Intuition for Convergence

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 6. Kapitel 3

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 5.56

bookVisual Intuition for Convergence

Svep för att visa menyn

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 6. Kapitel 3
some-alt