Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Visual Intuition for Convergence | Convergence and Theoretical Guarantees
Mathematics of Optimization in ML

bookVisual Intuition for Convergence

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 6. Luku 3

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain what Himmelblau's function is and why it's used here?

How does the gradient descent trajectory illustrate the challenges of optimization on complex surfaces?

What would happen if we changed the learning rate or starting point in this example?

Awesome!

Completion rate improved to 5.56

bookVisual Intuition for Convergence

Pyyhkäise näyttääksesi valikon

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define a complex, multi-modal surface (e.g., Himmelblau's function) def himmelblau(X): x, y = X return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Gradient of Himmelblau's function def grad_himmelblau(X): x, y = X dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7) dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7) return np.array([dx, dy]) # Gradient descent parameters lr = 0.01 steps = 100 start = np.array([-4.0, 4.0]) trajectory = [start.copy()] # Perform gradient descent x = start.copy() for _ in range(steps): grad = grad_himmelblau(x) x -= lr * grad trajectory.append(x.copy()) trajectory = np.array(trajectory) # Plot the surface and trajectory xlist = np.linspace(-6, 6, 400) ylist = np.linspace(-6, 6, 400) X, Y = np.meshgrid(xlist, ylist) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=50, cmap="jet") plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path') plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start') plt.title("Optimization Trajectory on Himmelblau's Function") plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()
copy
Note
Note

To improve convergence in difficult optimization landscapes, you can:

  • Use momentum or adaptive optimizers to help escape flat regions or saddle points;
  • Carefully tune the learning rate to avoid overshooting or getting stuck;
  • Apply learning rate schedules that decrease step size as you approach minima;
  • If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.
question mark

Which of the following statements best describes why optimization paths can behave unpredictably in complex, high-dimensional landscapes?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 6. Luku 3
some-alt