Gradient Descent MethodGradient Descent Method

We know how to solve optimization problems for the function of one variable using the algorithm described in the previous chapter. But what can we do if we have a function of multiple variables? We can turn to a numerical method - gradient descent.

What is gradient descent?

Gradient is a vector that consists of all partial derivatives of the function:


Thus, the problem of minimisation of function F(x1,..,xn) can be solved by constructing the following sequence of approximations:


We set a certain initial value x0 and η value representing the speed of gradient descent. Then we start the iterative process according to the formula above.

Stop criteria of the algorithm

The criteria for stopping iterations can be as follows:

  1. Stop the algorithm after a certain number of iterations.
  2. Iterate until the following condition is met:


eps = 10**(-6) or eps = 10**(-9) values are commonly used as the stop criterion of the iteration process.

We have to pay attention to two important features of the gradient descent method:

  1. This method can only find the point of minimum of the function F(x). If you want to find a point of maximum, you can consider function -F(x) and use gradient descent for it.
  2. If we compare the algorithm we discussed earlier with gradient descent, we can see that gradient descent performs a similar task to the first stage of the algorithm - finding a critical value, which might be a potential minimum point. As a result, it is possible that the point found by gradient descent may either be a local minimum within some subset of the domain or not a minimum point at all.


Let's find out how to solve the optimization problem in Python:

In this example, we define the Rosenbrock function, set an initial guess for x, and then use scipy.optimize.minimize to find the minimum of the Rosenbrock function. The result.x attribute contains the optimal x, and contains the minimum value of the Rosenbrock function.


The Rosenbrock function is often used as a benchmark for testing and comparing optimization algorithms due to its non-convex nature and the presence of a narrow, curved minimum valley.

Everything was clear?

Section 3. Chapter 6