Lære Non-Convex Landscapes and Saddle Points | Convergence and Theoretical Guarantees

When optimizing machine learning models, you often encounter objective functions that are non-convex. Unlike convex functions, which have a single global minimum and no local minima, non-convex functions can have many local minima, maxima, and saddle points.

A non-convex function is one where the line segment between any two points on the graph does not always lie above the graph itself. This lack of convexity introduces multiple regions where the gradient may vanish, but not all of these are optimal solutions. Specifically, you can encounter:

Local minima: points where the function value is lower than at all nearby points, but not necessarily the lowest overall;
Local maxima: points where the function value is higher than at all nearby points;
Saddle points: points that are minima along some directions and maxima along others.

In high-dimensional optimization problems, such as those found in deep learning, the landscape of the loss function is often highly non-convex. This means that as you perform gradient-based optimization, you may get stuck in regions that are not global minima. The presence of saddle points is particularly problematic because the gradient is zero at these points, but they do not represent optimal solutions.

Note

In high-dimensional spaces, saddle points vastly outnumber local minima and maxima. This means that when optimizing neural networks or other complex models, you are much more likely to encounter saddle points than true minima or maxima. At a saddle point, the gradient vanishes, so gradient descent can stall or slow down significantly, making optimization challenging.


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import matplotlib.pyplot as plt

# Define a non-convex function with a saddle point
def f(x, y):
    return x**2 - y**2

# Compute gradients for visualization
def grad_f(x, y):
    return np.array([2*x, -2*y])

# Generate a meshgrid for plotting
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

# Simulate a gradient descent path starting near the saddle point
path = []
point = np.array([1.5, 1.5])
lr = 0.1
for _ in range(15):
    path.append(point.copy())
    grad = grad_f(point[0], point[1])
    point = point - lr * grad

path = np.array(path)

# Plot the surface and optimization path
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, alpha=0.6, cmap='seismic')
ax.plot(path[:,0], path[:,1], f(path[:,0], path[:,1]), marker='o', color='k', label='Optimization Path')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
ax.set_title('Non-Convex Surface with Saddle Point and Optimization Path')
plt.legend()
plt.show()

Var alt klart?

Tak for dine kommentarer!

Sektion 6. Kapitel 2

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain what a saddle point is in more detail?

How do optimization algorithms typically handle saddle points in non-convex functions?

What are some strategies to avoid getting stuck in local minima or saddle points during training?

Awesome!

Completion rate improved to 5.56

Stryg for at vise menuen

Local minima: points where the function value is lower than at all nearby points, but not necessarily the lowest overall;
Local maxima: points where the function value is higher than at all nearby points;
Saddle points: points that are minima along some directions and maxima along others.

Note


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import matplotlib.pyplot as plt

# Define a non-convex function with a saddle point
def f(x, y):
    return x**2 - y**2

# Compute gradients for visualization
def grad_f(x, y):
    return np.array([2*x, -2*y])

# Generate a meshgrid for plotting
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

# Simulate a gradient descent path starting near the saddle point
path = []
point = np.array([1.5, 1.5])
lr = 0.1
for _ in range(15):
    path.append(point.copy())
    grad = grad_f(point[0], point[1])
    point = point - lr * grad

path = np.array(path)

# Plot the surface and optimization path
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, alpha=0.6, cmap='seismic')
ax.plot(path[:,0], path[:,1], f(path[:,0], path[:,1]), marker='o', color='k', label='Optimization Path')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
ax.set_title('Non-Convex Surface with Saddle Point and Optimization Path')
plt.legend()
plt.show()

Var alt klart?

Tak for dine kommentarer!

Sektion 6. Kapitel 2