Impara Noisy Gradient Trajectories | Stochastic and Mini-Batch Methods

In optimization for machine learning, you often work with noisy gradients rather than perfect, exact gradients. This noise can come from subsampling data (as in mini-batch or stochastic gradient descent), measurement error, or inherent randomness in the data itself. When you introduce noise into the gradient computation, the optimization trajectory, the path your parameters take on the loss landscape, becomes less predictable and more erratic. Instead of following the smooth, shortest descent to a minimum, the path may zigzag, overshoot, or get temporarily stuck in flat regions. This randomness can slow convergence or even prevent reaching the minimum if the noise is too large, but it can also help escape shallow local minima or plateaus by injecting variability into the search process.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic loss surface
def loss_surface(x, y):
    return 0.5 * (x ** 2 + 2 * y ** 2)

# Gradient of the surface
def grad(x, y):
    return np.array([x, 2 * y])

# Parameters for gradient descent
start_points = [(-2, 2), (2, -2), (-2, -2), (2, 2)]
lr = 0.1
steps = 40
noise_scale = 0.2

# Create a meshgrid to plot the surface
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = loss_surface(X, Y)

plt.figure(figsize=(8, 6))
plt.contourf(X, Y, Z, levels=30, cmap="Blues", alpha=0.7)
plt.colorbar(label="Loss")

# Plot multiple noisy gradient descent paths
for sx, sy in start_points:
    x, y = sx, sy
    path_x, path_y = [x], [y]
    for _ in range(steps):
        g = grad(x, y)
        noise = np.random.normal(0, noise_scale, size=2)
        x = x - lr * (g[0] + noise[0])
        y = y - lr * (g[1] + noise[1])
        path_x.append(x)
        path_y.append(y)
    plt.plot(path_x, path_y, marker="o", markersize=2, linewidth=1.5)

plt.title("Noisy Gradient Descent Trajectories on 2D Surface")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Note

Noisy optimization methods, such as stochastic gradient descent, strike a balance between exploration and exploitation. While noise can disrupt direct convergence, it also helps algorithms explore the loss landscape, potentially escaping shallow minima and discovering better solutions. The right amount of noise encourages sufficient exploration without preventing eventual convergence.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 3

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Awesome!

Completion rate improved to 5.56

Scorri per mostrare il menu


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic loss surface
def loss_surface(x, y):
    return 0.5 * (x ** 2 + 2 * y ** 2)

# Gradient of the surface
def grad(x, y):
    return np.array([x, 2 * y])

# Parameters for gradient descent
start_points = [(-2, 2), (2, -2), (-2, -2), (2, 2)]
lr = 0.1
steps = 40
noise_scale = 0.2

# Create a meshgrid to plot the surface
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = loss_surface(X, Y)

plt.figure(figsize=(8, 6))
plt.contourf(X, Y, Z, levels=30, cmap="Blues", alpha=0.7)
plt.colorbar(label="Loss")

# Plot multiple noisy gradient descent paths
for sx, sy in start_points:
    x, y = sx, sy
    path_x, path_y = [x], [y]
    for _ in range(steps):
        g = grad(x, y)
        noise = np.random.normal(0, noise_scale, size=2)
        x = x - lr * (g[0] + noise[0])
        y = y - lr * (g[1] + noise[1])
        path_x.append(x)
        path_y.append(y)
    plt.plot(path_x, path_y, marker="o", markersize=2, linewidth=1.5)

plt.title("Noisy Gradient Descent Trajectories on 2D Surface")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Note

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 3