Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Visualizing Gradient Steps | Gradient Descent Mechanics
Mathematics of Optimization in ML

bookVisualizing Gradient Steps

Understanding the mechanics of gradient descent involves more than just computing gradients, you must also visualize how each step moves along the loss surface. Imagine a 2D quadratic surface, such as a bowl-shaped function, where each point represents a possible set of parameters and the height represents the loss. Gradient descent starts at an initial point and, at each step, moves in the direction of steepest descent, tracing a path that ideally leads to the minimum of the surface. This path is a sequence of points, each closer to the minimum, and the trajectory depends on both the shape of the loss surface and the starting location. By visualizing these steps, you gain intuition for how optimization algorithms navigate the landscape and how the choice of learning rate or initialization can affect convergence.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define the quadratic loss surface: f(x, y) = (x-2)^2 + (y+3)^2 def loss_surface(point): x, y = point return (x - 2)**2 + (y + 3)**2 # Gradient of the loss surface def gradient(point): x, y = point df_dx = 2 * (x - 2) df_dy = 2 * (y + 3) return np.array([df_dx, df_dy]) # Gradient descent parameters learning_rate = 0.2 steps = 15 start_point = np.array([-5.0, 5.0]) points = [start_point.copy()] # Perform gradient descent current_point = start_point.copy() for _ in range(steps): grad = gradient(current_point) current_point = current_point - learning_rate * grad points.append(current_point.copy()) points = np.array(points) # Plot the surface and the trajectory x = np.linspace(-6, 6, 100) y = np.linspace(-8, 8, 100) X, Y = np.meshgrid(x, y) Z = (X - 2)**2 + (Y + 3)**2 plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=30, cmap="viridis") plt.plot(points[:, 0], points[:, 1], marker="o", color="red", label="Gradient steps") plt.scatter(2, -3, color="green", s=100, label="Minimum") plt.title("Gradient Descent Trajectory on a 2D Quadratic Surface") plt.xlabel("x") plt.ylabel("y") plt.legend() plt.show()
copy
Note
Note

The path taken by gradient descent can look very different depending on where you start. If you initialize closer to the minimum, the trajectory is shorter and more direct; starting farther away or in a region with a steep slope can lead to longer, curved, or even oscillating paths before reaching the minimum.

question mark

Which of the following statements best explains why the trajectory of gradient descent can differ for the same loss surface?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 3

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain how the learning rate affects the gradient descent trajectory?

What would happen if we started from a different initial point?

Can you describe what the contour plot shows in more detail?

Awesome!

Completion rate improved to 5.56

bookVisualizing Gradient Steps

Swipe um das Menü anzuzeigen

Understanding the mechanics of gradient descent involves more than just computing gradients, you must also visualize how each step moves along the loss surface. Imagine a 2D quadratic surface, such as a bowl-shaped function, where each point represents a possible set of parameters and the height represents the loss. Gradient descent starts at an initial point and, at each step, moves in the direction of steepest descent, tracing a path that ideally leads to the minimum of the surface. This path is a sequence of points, each closer to the minimum, and the trajectory depends on both the shape of the loss surface and the starting location. By visualizing these steps, you gain intuition for how optimization algorithms navigate the landscape and how the choice of learning rate or initialization can affect convergence.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import numpy as np import matplotlib.pyplot as plt # Define the quadratic loss surface: f(x, y) = (x-2)^2 + (y+3)^2 def loss_surface(point): x, y = point return (x - 2)**2 + (y + 3)**2 # Gradient of the loss surface def gradient(point): x, y = point df_dx = 2 * (x - 2) df_dy = 2 * (y + 3) return np.array([df_dx, df_dy]) # Gradient descent parameters learning_rate = 0.2 steps = 15 start_point = np.array([-5.0, 5.0]) points = [start_point.copy()] # Perform gradient descent current_point = start_point.copy() for _ in range(steps): grad = gradient(current_point) current_point = current_point - learning_rate * grad points.append(current_point.copy()) points = np.array(points) # Plot the surface and the trajectory x = np.linspace(-6, 6, 100) y = np.linspace(-8, 8, 100) X, Y = np.meshgrid(x, y) Z = (X - 2)**2 + (Y + 3)**2 plt.figure(figsize=(8, 6)) plt.contour(X, Y, Z, levels=30, cmap="viridis") plt.plot(points[:, 0], points[:, 1], marker="o", color="red", label="Gradient steps") plt.scatter(2, -3, color="green", s=100, label="Minimum") plt.title("Gradient Descent Trajectory on a 2D Quadratic Surface") plt.xlabel("x") plt.ylabel("y") plt.legend() plt.show()
copy
Note
Note

The path taken by gradient descent can look very different depending on where you start. If you initialize closer to the minimum, the trajectory is shorter and more direct; starting farther away or in a region with a steep slope can lead to longer, curved, or even oscillating paths before reaching the minimum.

question mark

Which of the following statements best explains why the trajectory of gradient descent can differ for the same loss surface?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 3
some-alt