Implementing Gradient Descent in Python
メニューを表示するにはスワイプしてください
Gradient descent follows a simple but powerful idea: move in the direction of steepest descent to minimize a function.
The mathematical rule is:
theta = theta - alpha * gradient(theta)
Where:
thetais the parameter we are optimizing;alphais the learning rate (step size);gradient(theta)is the gradient of the function attheta.
1. Define the Function and Its Derivative
We start with a simple quadratic function:
def f(theta):
return theta**2 # Function we want to minimize
Its derivative (gradient) is:
def gradient(theta):
return 2 * theta # Derivative: f'(theta) = 2*theta
f(theta): this is our function, and we want to find the value of theta that minimizes it;gradient(theta): this tells us the slope at any pointtheta, which we use to determine the update direction.
2. Initialize Gradient Descent Parameters
alpha = 0.3 # Learning rate
theta = 3.0 # Initial starting point
tolerance = 1e-5 # Convergence threshold
max_iterations = 20 # Maximum number of updates
alpha(learning rate): controls how big each step is;theta(initial guess): the starting point for descent;tolerance: when the updates become tiny, we stop;max_iterations: ensures we don't loop forever.
3. Perform Gradient Descent
for i in range(max_iterations):
grad = gradient(theta) # Compute gradient
new_theta = theta - alpha * grad # Update rule
if abs(new_theta - theta) < tolerance:
print("Converged!")
break
theta = new_theta
- Calculate the gradient at
theta; - Update
thetausing the gradient descent formula; - Stop when updates are too small (convergence);
- Print each step to monitor progress.
4. Visualizing Gradient Descent
123456789101112131415161718192021222324252627282930313233343536373839import matplotlib.pyplot as plt import numpy as np def f(theta): return theta**2 # Function we want to minimize def gradient(theta): return 2 * theta # Derivative: f'(theta) = 2*theta alpha = 0.3 # Learning rate theta = 3.0 # Initial starting point tolerance = 1e-5 # Convergence threshold max_iterations = 20 # Maximum number of updates theta_values = [theta] # Track parameter values output_values = [f(theta)] # Track function values for i in range(max_iterations): grad = gradient(theta) # Compute gradient new_theta = theta - alpha * grad # Update rule if abs(new_theta - theta) < tolerance: break theta = new_theta theta_values.append(theta) output_values.append(f(theta)) # Prepare data for plotting the full function curve theta_range = np.linspace(-4, 4, 100) output_range = f(theta_range) # Plot plt.plot(theta_range, output_range, label="f(θ) = θ²", color='black') plt.scatter(theta_values, output_values, color='red', label="Gradient Descent Steps") plt.title("Gradient Descent Visualization") plt.xlabel("θ") plt.ylabel("f(θ)") plt.legend() plt.grid(True) plt.show()
This plot shows:
- The function curve f(θ)=θ2;
- Red dots representing each gradient descent step until convergence.
すべて明確でしたか?
フィードバックありがとうございます!
セクション 3. 章 10
AIに質問する
AIに質問する
何でも質問するか、提案された質問の1つを試してチャットを始めてください
セクション 3. 章 10