Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Derivatives and Gradients | Mathematical Foundations
Mathematics of Optimization in ML

bookDerivatives and Gradients

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function f(x)f(x) at a point xx tells you the rate at which f(x)f(x) increases or decreases as you move slightly from xx. When dealing with functions of multiple variables, such as f(x,y)f(x, y), the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function f(x,y)f(x, y), the gradient is written as:

f(x,y)=[fx,fy]∇f(x, y) = \left[ \frac{∂f}{∂x}, \frac{∂f}{∂y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note
Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.

1234567891011121314151617181920212223242526272829
import numpy as np import matplotlib.pyplot as plt # Define a simple 2D quadratic function: f(x, y) = x^2 + y^2 def f(x, y): return x**2 + y**2 # Compute the gradient of f def grad_f(x, y): df_dx = 2 * x df_dy = 2 * y return np.array([df_dx, df_dy]) # Create a grid of points x = np.linspace(-3, 3, 20) y = np.linspace(-3, 3, 20) X, Y = np.meshgrid(x, y) # Compute gradients at each grid point U = 2 * X V = 2 * Y plt.figure(figsize=(6, 6)) plt.quiver(X, Y, U, V, color="blue") plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$") plt.xlabel("x") plt.ylabel("y") plt.grid(True) plt.show()
copy
question mark

Which statement best describes the role of the gradient in optimization?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 1

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain what the arrows in the gradient vector field plot represent?

How does the gradient help in optimization algorithms?

What would happen if we moved in the direction of the negative gradient in this example?

Awesome!

Completion rate improved to 5.56

bookDerivatives and Gradients

Sveip for å vise menyen

Derivatives and gradients form the mathematical backbone of optimization in machine learning. A derivative measures how a function changes as its input changes. In one dimension, the derivative of a function f(x)f(x) at a point xx tells you the rate at which f(x)f(x) increases or decreases as you move slightly from xx. When dealing with functions of multiple variables, such as f(x,y)f(x, y), the concept generalizes to partial derivatives, which capture the rate of change of the function with respect to each variable independently, holding the others constant.

A gradient is a vector that collects all the partial derivatives of a function with respect to its inputs. For a function f(x,y)f(x, y), the gradient is written as:

f(x,y)=[fx,fy]∇f(x, y) = \left[ \frac{∂f}{∂x}, \frac{∂f}{∂y} \right]

This vector points in the direction of the greatest rate of increase of the function. In optimization, gradients are essential: they guide algorithms on how to adjust parameters to minimize or maximize an objective function. When you hear about moving in the direction of the negative gradient, it means taking steps that most rapidly decrease the value of the function, which is the core idea behind gradient-based optimization methods.

Note
Note

Think of the gradient as a compass that always points toward the direction of steepest ascent on a surface. If you want to climb a hill as quickly as possible, you would follow the direction of the gradient. Conversely, to descend as quickly as possible, like minimizing a loss function in machine learning, you go in the opposite direction, following the negative gradient.

1234567891011121314151617181920212223242526272829
import numpy as np import matplotlib.pyplot as plt # Define a simple 2D quadratic function: f(x, y) = x^2 + y^2 def f(x, y): return x**2 + y**2 # Compute the gradient of f def grad_f(x, y): df_dx = 2 * x df_dy = 2 * y return np.array([df_dx, df_dy]) # Create a grid of points x = np.linspace(-3, 3, 20) y = np.linspace(-3, 3, 20) X, Y = np.meshgrid(x, y) # Compute gradients at each grid point U = 2 * X V = 2 * Y plt.figure(figsize=(6, 6)) plt.quiver(X, Y, U, V, color="blue") plt.title("Gradient Vector Field of $f(x, y) = x^2 + y^2$") plt.xlabel("x") plt.ylabel("y") plt.grid(True) plt.show()
copy
question mark

Which statement best describes the role of the gradient in optimization?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 1
some-alt