Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Non-Convex Landscapes and Saddle Points | Convergence and Theoretical Guarantees
Mathematics of Optimization in ML

bookNon-Convex Landscapes and Saddle Points

When optimizing machine learning models, you often encounter objective functions that are non-convex. Unlike convex functions, which have a single global minimum and no local minima, non-convex functions can have many local minima, maxima, and saddle points.

A non-convex function is one where the line segment between any two points on the graph does not always lie above the graph itself. This lack of convexity introduces multiple regions where the gradient may vanish, but not all of these are optimal solutions. Specifically, you can encounter:

  • Local minima: points where the function value is lower than at all nearby points, but not necessarily the lowest overall;
  • Local maxima: points where the function value is higher than at all nearby points;
  • Saddle points: points that are minima along some directions and maxima along others.

In high-dimensional optimization problems, such as those found in deep learning, the landscape of the loss function is often highly non-convex. This means that as you perform gradient-based optimization, you may get stuck in regions that are not global minima. The presence of saddle points is particularly problematic because the gradient is zero at these points, but they do not represent optimal solutions.

Note
Note

In high-dimensional spaces, saddle points vastly outnumber local minima and maxima. This means that when optimizing neural networks or other complex models, you are much more likely to encounter saddle points than true minima or maxima. At a saddle point, the gradient vanishes, so gradient descent can stall or slow down significantly, making optimization challenging.

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt # Define a non-convex function with a saddle point def f(x, y): return x**2 - y**2 # Compute gradients for visualization def grad_f(x, y): return np.array([2*x, -2*y]) # Generate a meshgrid for plotting x = np.linspace(-2, 2, 100) y = np.linspace(-2, 2, 100) X, Y = np.meshgrid(x, y) Z = f(X, Y) # Simulate a gradient descent path starting near the saddle point path = [] point = np.array([1.5, 1.5]) lr = 0.1 for _ in range(15): path.append(point.copy()) grad = grad_f(point[0], point[1]) point = point - lr * grad path = np.array(path) # Plot the surface and optimization path fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(111, projection='3d') ax.plot_surface(X, Y, Z, alpha=0.6, cmap='seismic') ax.plot(path[:,0], path[:,1], f(path[:,0], path[:,1]), marker='o', color='k', label='Optimization Path') ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('f(x, y)') ax.set_title('Non-Convex Surface with Saddle Point and Optimization Path') plt.legend() plt.show()
copy
question mark

Which of the following statements about non-convex optimization and saddle points are true?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 6. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.56

bookNon-Convex Landscapes and Saddle Points

Swipe to show menu

When optimizing machine learning models, you often encounter objective functions that are non-convex. Unlike convex functions, which have a single global minimum and no local minima, non-convex functions can have many local minima, maxima, and saddle points.

A non-convex function is one where the line segment between any two points on the graph does not always lie above the graph itself. This lack of convexity introduces multiple regions where the gradient may vanish, but not all of these are optimal solutions. Specifically, you can encounter:

  • Local minima: points where the function value is lower than at all nearby points, but not necessarily the lowest overall;
  • Local maxima: points where the function value is higher than at all nearby points;
  • Saddle points: points that are minima along some directions and maxima along others.

In high-dimensional optimization problems, such as those found in deep learning, the landscape of the loss function is often highly non-convex. This means that as you perform gradient-based optimization, you may get stuck in regions that are not global minima. The presence of saddle points is particularly problematic because the gradient is zero at these points, but they do not represent optimal solutions.

Note
Note

In high-dimensional spaces, saddle points vastly outnumber local minima and maxima. This means that when optimizing neural networks or other complex models, you are much more likely to encounter saddle points than true minima or maxima. At a saddle point, the gradient vanishes, so gradient descent can stall or slow down significantly, making optimization challenging.

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt # Define a non-convex function with a saddle point def f(x, y): return x**2 - y**2 # Compute gradients for visualization def grad_f(x, y): return np.array([2*x, -2*y]) # Generate a meshgrid for plotting x = np.linspace(-2, 2, 100) y = np.linspace(-2, 2, 100) X, Y = np.meshgrid(x, y) Z = f(X, Y) # Simulate a gradient descent path starting near the saddle point path = [] point = np.array([1.5, 1.5]) lr = 0.1 for _ in range(15): path.append(point.copy()) grad = grad_f(point[0], point[1]) point = point - lr * grad path = np.array(path) # Plot the surface and optimization path fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(111, projection='3d') ax.plot_surface(X, Y, Z, alpha=0.6, cmap='seismic') ax.plot(path[:,0], path[:,1], f(path[:,0], path[:,1]), marker='o', color='k', label='Optimization Path') ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('f(x, y)') ax.set_title('Non-Convex Surface with Saddle Point and Optimization Path') plt.legend() plt.show()
copy
question mark

Which of the following statements about non-convex optimization and saddle points are true?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 6. ChapterΒ 2
some-alt