Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Implementing Gradient Descent in Python | Mathematical Analysis
Mathematics for Data Science with Python

bookImplementing Gradient Descent in Python

メニューを表示するにはスワイプしてください

Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.

The mathematical rule is:

theta = theta - alpha * gradient(theta)

Where:

  • theta is the parameter we are optimizing;
  • alpha is the learning rate (step size);
  • gradient(theta) is the gradient of the function at theta.

1. Define the Function and Its Derivative

We start with a simple quadratic function:

def f(theta):
    return theta**2  # Function we want to minimize

Its derivative (gradient) is:

def gradient(theta):
    return 2 * theta  # Derivative: f'(theta) = 2*theta
  • f(theta): this is our function, and we want to find the value of theta that minimizes it;
  • gradient(theta): this tells us the slope at any point theta, which we use to determine the update direction.

2. Initialize Gradient Descent Parameters

alpha = 0.3  # Learning rate
theta = 3.0  # Initial starting point
tolerance = 1e-5  # Convergence threshold
max_iterations = 20  # Maximum number of updates
  • alpha (learning rate): controls how big each step is;
  • theta (initial guess): the starting point for descent;
  • tolerance: when the updates become tiny, we stop;
  • max_iterations: ensures we don't loop forever.

3. Perform Gradient Descent

for i in range(max_iterations):
    grad = gradient(theta)  # Compute gradient
    new_theta = theta - alpha * grad  # Update rule
    if abs(new_theta - theta) < tolerance:
        print("Converged!")
        break
    theta = new_theta
  • Calculate the gradient at theta;
  • Update theta using the gradient descent formula;
  • Stop when updates are too small (convergence);
  • Print each step to monitor progress.

4. Visualizing Gradient Descent

123456789101112131415161718192021222324252627282930313233343536373839
import matplotlib.pyplot as plt import numpy as np def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates theta_values = [theta] # Track parameter values output_values = [f(theta)] # Track function values for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: break theta = new_theta theta_values.append(theta) output_values.append(f(theta)) # Prepare data for plotting the full function curve theta_range = np.linspace(-4, 4, 100) output_range = f(theta_range) # Plot plt.plot(theta_range, output_range, label="f(θ) = θ²", color='black') plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.title("Gradient Descent Visualization") plt.xlabel("θ") plt.ylabel("f(θ)") plt.legend() plt.grid(True) plt.show()
copy

This plot shows:

  • The function curve f(θ)=θ2f(θ) = θ^2;
  • Red dots representing each gradient descent step until convergence.
question mark

What is the gradient descent update rule for function f?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 3.  10

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 3.  10
some-alt