Learn Derivatives and Gradients | Mathematical Foundations

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function $f(x)$ at a point $x$ tells you the rate at which $f(x)$ increases or decreases as you move slightly from $x$ . When dealing with functions of multiple variables, such as $f(x, y)$ , the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function $f(x, y)$ , the gradient is written as:

∇f(x, y) = \left[ \frac{∂f}{∂x}, \frac{∂f}{∂y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.


              1234567891011121314151617181920212223242526272829
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic function: f(x, y) = x^2 + y^2
def f(x, y):
    return x**2 + y**2

# Compute the gradient of f
def grad_f(x, y):
    df_dx = 2 * x
    df_dy = 2 * y
    return np.array([df_dx, df_dy])

# Create a grid of points
x = np.linspace(-3, 3, 20)
y = np.linspace(-3, 3, 20)
X, Y = np.meshgrid(x, y)

# Compute gradients at each grid point
U = 2 * X
V = 2 * Y

plt.figure(figsize=(6, 6))
plt.quiver(X, Y, U, V, color="blue")
plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function $f(x, y)$ , the gradient is written as:

∇f(x, y) = \left[ \frac{∂f}{∂x}, \frac{∂f}{∂y} \right]

Note


              1234567891011121314151617181920212223242526272829
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic function: f(x, y) = x^2 + y^2
def f(x, y):
    return x**2 + y**2

# Compute the gradient of f
def grad_f(x, y):
    df_dx = 2 * x
    df_dy = 2 * y
    return np.array([df_dx, df_dy])

# Create a grid of points
x = np.linspace(-3, 3, 20)
y = np.linspace(-3, 3, 20)
X, Y = np.meshgrid(x, y)

# Compute gradients at each grid point
U = 2 * X
V = 2 * Y

plt.figure(figsize=(6, 6))
plt.quiver(X, Y, U, V, color="blue")
plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1