Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Visual Intuition for Convergence | Convergence and Theoretical Guarantees
Mathematics of Optimization in ML

bookVisual Intuition for Convergence

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surfaceβ€”such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 6. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.56

bookVisual Intuition for Convergence

Swipe to show menu

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surfaceβ€”such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 6. ChapterΒ 3
some-alt