Вивчайте Noisy Gradient Trajectories | Stochastic and Mini-Batch Methods

In optimization for machine learning, you often work with noisy gradients rather than perfect, exact gradients. This noise can come from subsampling data (as in mini-batch or stochastic gradient descent), measurement error, or inherent randomness in the data itself. When you introduce noise into the gradient computation, the optimization trajectory, the path your parameters take on the loss landscape, becomes less predictable and more erratic. Instead of following the smooth, shortest descent to a minimum, the path may zigzag, overshoot, or get temporarily stuck in flat regions. This randomness can slow convergence or even prevent reaching the minimum if the noise is too large, but it can also help escape shallow local minima or plateaus by injecting variability into the search process.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic loss surface
def loss_surface(x, y):
    return 0.5 * (x ** 2 + 2 * y ** 2)

# Gradient of the surface
def grad(x, y):
    return np.array([x, 2 * y])

# Parameters for gradient descent
start_points = [(-2, 2), (2, -2), (-2, -2), (2, 2)]
lr = 0.1
steps = 40
noise_scale = 0.2

# Create a meshgrid to plot the surface
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = loss_surface(X, Y)

plt.figure(figsize=(8, 6))
plt.contourf(X, Y, Z, levels=30, cmap="Blues", alpha=0.7)
plt.colorbar(label="Loss")

# Plot multiple noisy gradient descent paths
for sx, sy in start_points:
    x, y = sx, sy
    path_x, path_y = [x], [y]
    for _ in range(steps):
        g = grad(x, y)
        noise = np.random.normal(0, noise_scale, size=2)
        x = x - lr * (g[0] + noise[0])
        y = y - lr * (g[1] + noise[1])
        path_x.append(x)
        path_y.append(y)
    plt.plot(path_x, path_y, marker="o", markersize=2, linewidth=1.5)

plt.title("Noisy Gradient Descent Trajectories on 2D Surface")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Note

Noisy optimization methods, such as stochastic gradient descent, strike a balance between exploration and exploitation. While noise can disrupt direct convergence, it also helps algorithms explore the loss landscape, potentially escaping shallow minima and discovering better solutions. The right amount of noise encourages sufficient exploration without preventing eventual convergence.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 3. Розділ 3

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Awesome!

Completion rate improved to 5.56

Свайпніть щоб показати меню


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
import numpy as np
import matplotlib.pyplot as plt

# Define a simple 2D quadratic loss surface
def loss_surface(x, y):
    return 0.5 * (x ** 2 + 2 * y ** 2)

# Gradient of the surface
def grad(x, y):
    return np.array([x, 2 * y])

# Parameters for gradient descent
start_points = [(-2, 2), (2, -2), (-2, -2), (2, 2)]
lr = 0.1
steps = 40
noise_scale = 0.2

# Create a meshgrid to plot the surface
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = loss_surface(X, Y)

plt.figure(figsize=(8, 6))
plt.contourf(X, Y, Z, levels=30, cmap="Blues", alpha=0.7)
plt.colorbar(label="Loss")

# Plot multiple noisy gradient descent paths
for sx, sy in start_points:
    x, y = sx, sy
    path_x, path_y = [x], [y]
    for _ in range(steps):
        g = grad(x, y)
        noise = np.random.normal(0, noise_scale, size=2)
        x = x - lr * (g[0] + noise[0])
        y = y - lr * (g[1] + noise[1])
        path_x.append(x)
        path_y.append(y)
    plt.plot(path_x, path_y, marker="o", markersize=2, linewidth=1.5)

plt.title("Noisy Gradient Descent Trajectories on 2D Surface")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Note

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 3. Розділ 3