Course Content
Mathematics for Data Analysis and Modeling
Mathematics for Data Analysis and Modeling
Gradient Descent Method
We know how to solve optimization problems for the function of one variable using the algorithm described in the previous chapter. But what can we do if we have a function of multiple variables? We can turn to a numerical method - gradient descent.
What is gradient descent?
Gradient is a vector that consists of all partial derivatives of the function:
Thus, the problem of minimisation of function F(x1,..,xn)
can be solved by constructing the following sequence of approximations:
We set a certain initial value x0
and η
value representing the speed of gradient descent. Then we start the iterative process according to the formula above.
Stop criteria of the algorithm
The criteria for stopping iterations can be as follows:
- Stop the algorithm after a certain number of iterations;
- Iterate until the following condition is met:
Note
eps = 10**(-6)
oreps = 10**(-9)
values are commonly used as the stop criterion of the iteration process.
We have to pay attention to two important features of the gradient descent method:
- This method can only find the point of minimum of the function
F(x)
. If you want to find a point of maximum, you can consider function-F(x)
and use gradient descent for it; - If we compare the algorithm we discussed earlier with gradient descent, we can see that gradient descent performs a similar task to the first stage of the algorithm - finding a critical value, which might be a potential minimum point. As a result, it is possible that the point found by gradient descent may either be a local minimum within some subset of the domain or not a minimum point at all.
Example
Let's find out how to solve the optimization problem in Python:
import numpy as np from scipy.optimize import minimize # Define the Rosenbrock function def rosenbrock(x): return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2 # Initial guess for x and y x0 = np.array([2, 2]) # Use scipy.optimize.minimize to find the minimum of the Rosenbrock function result = minimize(rosenbrock, x0) # Extract the optimal x and the minimum value of the Rosenbrock function optimal_x = result.x min_value = result.fun print(f'Optimal x: {optimal_x}') print(f'Minimum value of the Rosenbrock function: {min_value:.4f}')
In this example, we define the Rosenbrock function, set an initial guess for x
, and then use scipy.optimize.minimize
to find the minimum of the Rosenbrock function. The result.x
attribute contains the optimal x
, and result.fun
contains the minimum value of the Rosenbrock function.
Note
The Rosenbrock function is often used as a benchmark for testing and comparing optimization algorithms due to its non-convex nature and the presence of a narrow, curved minimum valley.
Thanks for your feedback!