Learn Visual Intuition for Convergence | Convergence and Theoretical Guarantees

When you optimize machine learning models, you rarely work with simple, smooth valleys. Most real-world loss surfaces are high-dimensional and filled with hills, valleys, plateaus, and saddle points. On these complex landscapes, the path your optimizer takes is rarely a straight line to the bottom. Instead, optimization trajectories may zigzag, spiral, or even get stuck for long periods before escaping to lower regions. This behavior is shaped by the geometry of the surface—such as the presence of sharp ridges, flat regions, or multiple local minima. In high dimensions, these effects become even more pronounced, and visualizing them directly is challenging. However, by studying lower-dimensional analogs, you can build strong geometric intuition for how and why optimizers behave as they do.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445
            
import numpy as np
import matplotlib.pyplot as plt

# Define a complex, multi-modal surface (e.g., Himmelblau's function)
def himmelblau(X):
    x, y = X
    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2

# Gradient of Himmelblau's function
def grad_himmelblau(X):
    x, y = X
    dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7)
    dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7)
    return np.array([dx, dy])

# Gradient descent parameters
lr = 0.01
steps = 100
start = np.array([-4.0, 4.0])
trajectory = [start.copy()]

# Perform gradient descent
x = start.copy()
for _ in range(steps):
    grad = grad_himmelblau(x)
    x -= lr * grad
    trajectory.append(x.copy())

trajectory = np.array(trajectory)

# Plot the surface and trajectory
xlist = np.linspace(-6, 6, 400)
ylist = np.linspace(-6, 6, 400)
X, Y = np.meshgrid(xlist, ylist)
Z = himmelblau([X, Y])

plt.figure(figsize=(8, 6))
plt.contour(X, Y, Z, levels=50, cmap="jet")
plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path')
plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start')
plt.title("Optimization Trajectory on Himmelblau's Function")
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Note

To improve convergence in difficult optimization landscapes, you can:

Use momentum or adaptive optimizers to help escape flat regions or saddle points;
Carefully tune the learning rate to avoid overshooting or getting stuck;
Apply learning rate schedules that decrease step size as you approach minima;
If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.

Everything was clear?

Thanks for your feedback!

Section 6. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.56

Swipe to show menu


              123456789101112131415161718192021222324252627282930313233343536373839404142434445
            
import numpy as np
import matplotlib.pyplot as plt

# Define a complex, multi-modal surface (e.g., Himmelblau's function)
def himmelblau(X):
    x, y = X
    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2

# Gradient of Himmelblau's function
def grad_himmelblau(X):
    x, y = X
    dx = 4 * x * (x**2 + y - 11) + 2 * (x + y**2 - 7)
    dy = 2 * (x**2 + y - 11) + 4 * y * (x + y**2 - 7)
    return np.array([dx, dy])

# Gradient descent parameters
lr = 0.01
steps = 100
start = np.array([-4.0, 4.0])
trajectory = [start.copy()]

# Perform gradient descent
x = start.copy()
for _ in range(steps):
    grad = grad_himmelblau(x)
    x -= lr * grad
    trajectory.append(x.copy())

trajectory = np.array(trajectory)

# Plot the surface and trajectory
xlist = np.linspace(-6, 6, 400)
ylist = np.linspace(-6, 6, 400)
X, Y = np.meshgrid(xlist, ylist)
Z = himmelblau([X, Y])

plt.figure(figsize=(8, 6))
plt.contour(X, Y, Z, levels=50, cmap="jet")
plt.plot(trajectory[:,0], trajectory[:,1], marker='o', color='red', markersize=3, linewidth=2, label='Gradient Descent Path')
plt.scatter([start[0]], [start[1]], color='green', s=80, label='Start')
plt.title("Optimization Trajectory on Himmelblau's Function")
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Note

To improve convergence in difficult optimization landscapes, you can:

Use momentum or adaptive optimizers to help escape flat regions or saddle points;
Carefully tune the learning rate to avoid overshooting or getting stuck;
Apply learning rate schedules that decrease step size as you approach minima;
If possible, initialize parameters in diverse locations to reduce the risk of poor local minima.

Everything was clear?

Thanks for your feedback!

Section 6. Chapter 3