Gradient Descent Method

We know how to solve optimization problems for the function of one variable using the algorithm described in the previous chapter. But what can we do if we have a function of multiple variables? We can turn to a numerical method - gradient descent.

What is gradient descent?

Gradient is a vector that consists of all partial derivatives of the function:

Thus, the problem of minimisation of function F(x1,..,xn) can be solved by constructing the following sequence of approximations:

We set a certain initial value x0 and η value representing the speed of gradient descent. Then we start the iterative process according to the formula above.

Stop criteria of the algorithm

The criteria for stopping iterations can be as follows:

Stop the algorithm after a certain number of iterations;
Iterate until the following condition is met:

Note

eps = 10**(-6) or eps = 10**(-9) values are commonly used as the stop criterion of the iteration process.

We have to pay attention to two important features of the gradient descent method:

This method can only find the point of minimum of the function F(x). If you want to find a point of maximum, you can consider function -F(x) and use gradient descent for it;
If we compare the algorithm we discussed earlier with gradient descent, we can see that gradient descent performs a similar task to the first stage of the algorithm - finding a critical value, which might be a potential minimum point. As a result, it is possible that the point found by gradient descent may either be a local minimum within some subset of the domain or not a minimum point at all.

Example

Let's find out how to solve the optimization problem in Python:


              12345678910111213141516171819
            
import numpy as np
from scipy.optimize import minimize

# Define the Rosenbrock function
def rosenbrock(x):
    return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2

# Initial guess for x and y
x0 = np.array([2, 2])

# Use scipy.optimize.minimize to find the minimum of the Rosenbrock function
result = minimize(rosenbrock, x0)

# Extract the optimal x and the minimum value of the Rosenbrock function
optimal_x = result.x
min_value = result.fun

print(f'Optimal x: {optimal_x}')
print(f'Minimum value of the Rosenbrock function: {min_value:.4f}')

In this example, we define the Rosenbrock function, set an initial guess for x, and then use scipy.optimize.minimize to find the minimum of the Rosenbrock function. The result.x attribute contains the optimal x, and result.fun contains the minimum value of the Rosenbrock function.

Note

The Rosenbrock function is often used as a benchmark for testing and comparing optimization algorithms due to its non-convex nature and the presence of a narrow, curved minimum valley.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 6

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Mathematics for Data Analysis and Modeling