Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Derivatives and Gradients | Mathematical Foundations
Mathematics of Optimization in ML

bookDerivatives and Gradients

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function f(x)f(x) at a point xx tells you the rate at which f(x)f(x) increases or decreases as you move slightly from xx. When dealing with functions of multiple variables, such as f(x,y)f(x, y), the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function f(x,y)f(x, y), the gradient is written as:

βˆ‡f(x,y)=[βˆ‚fβˆ‚x,βˆ‚fβˆ‚y]βˆ‡f(x, y) = \left[ \frac{βˆ‚f}{βˆ‚x}, \frac{βˆ‚f}{βˆ‚y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note
Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.

1234567891011121314151617181920212223242526272829
import numpy as np import matplotlib.pyplot as plt # Define a simple 2D quadratic function: f(x, y) = x^2 + y^2 def f(x, y): return x**2 + y**2 # Compute the gradient of f def grad_f(x, y): df_dx = 2 * x df_dy = 2 * y return np.array([df_dx, df_dy]) # Create a grid of points x = np.linspace(-3, 3, 20) y = np.linspace(-3, 3, 20) X, Y = np.meshgrid(x, y) # Compute gradients at each grid point U = 2 * X V = 2 * Y plt.figure(figsize=(6, 6)) plt.quiver(X, Y, U, V, color="blue") plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$") plt.xlabel("x") plt.ylabel("y") plt.grid(True) plt.show()
copy
question mark

Which statement best describes the role of the gradient in optimization?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.56

bookDerivatives and Gradients

Swipe to show menu

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function f(x)f(x) at a point xx tells you the rate at which f(x)f(x) increases or decreases as you move slightly from xx. When dealing with functions of multiple variables, such as f(x,y)f(x, y), the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function f(x,y)f(x, y), the gradient is written as:

βˆ‡f(x,y)=[βˆ‚fβˆ‚x,βˆ‚fβˆ‚y]βˆ‡f(x, y) = \left[ \frac{βˆ‚f}{βˆ‚x}, \frac{βˆ‚f}{βˆ‚y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note
Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.

1234567891011121314151617181920212223242526272829
import numpy as np import matplotlib.pyplot as plt # Define a simple 2D quadratic function: f(x, y) = x^2 + y^2 def f(x, y): return x**2 + y**2 # Compute the gradient of f def grad_f(x, y): df_dx = 2 * x df_dy = 2 * y return np.array([df_dx, df_dy]) # Create a grid of points x = np.linspace(-3, 3, 20) y = np.linspace(-3, 3, 20) X, Y = np.meshgrid(x, y) # Compute gradients at each grid point U = 2 * X V = 2 * Y plt.figure(figsize=(6, 6)) plt.quiver(X, Y, U, V, color="blue") plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$") plt.xlabel("x") plt.ylabel("y") plt.grid(True) plt.show()
copy
question mark

Which statement best describes the role of the gradient in optimization?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 1
some-alt