Lære Derivatives and Gradients | Mathematical Foundations

Sveip for å vise menyen

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function $f(x)$ at a point $x$ tells you the rate at which $f(x)$ increases or decreases as you move slightly from $x$ . When dealing with functions of multiple variables, such as $f(x, y)$ , the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function $f(x, y)$ , the gradient is written as:

∇f(x, y) = \left[ \frac{∂f}{∂x}, \frac{∂f}{∂y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.


              1234567891011121314151617181920212223242526272829
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic function: f(x, y) = x^2 + y^2
def f(x, y):
    return x**2 + y**2

# Compute the gradient of f
def grad_f(x, y):
    df_dx = 2 * x
    df_dy = 2 * y
    return np.array([df_dx, df_dy])

# Create a grid of points
x = np.linspace(-3, 3, 20)
y = np.linspace(-3, 3, 20)
X, Y = np.meshgrid(x, y)

# Compute gradients at each grid point
U = 2 * X
V = 2 * Y

plt.figure(figsize=(6, 6))
plt.quiver(X, Y, U, V, color="blue")
plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 1

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 1