Learn Inertia and Damping Effects | Momentum and Acceleration

Understanding the roles of inertia and damping is crucial for mastering momentum-based optimization methods in machine learning. Inertia refers to the tendency of an optimizer to continue moving in its current direction, while damping acts as a counterforce that reduces this motion, preventing runaway behavior. Mathematically, these effects can be described using difference equations similar to those in classical mechanics. When you apply momentum in gradient descent, the update rule typically takes the form:

v_{t+1} = \beta v_t - \alpha \nabla f(x_t) \\ x_{t+1} = x_t + v_{t+1}

Here, $v_t$ is the velocity (accumulated gradient), $β$ is the momentum coefficient (inertia), and $α$ is the learning rate. The term $β v_t$ introduces inertia, causing the optimizer to continue moving in previous directions, which can help speed up convergence in shallow valleys. However, too much inertia can lead to overshooting and oscillations around the minimum. Damping is implicitly controlled by the value of $β$ : higher $β$ means less damping and more oscillatory behavior, while lower $β$ increases damping, resulting in smoother, slower convergence. The balance between inertia and damping determines whether the optimizer approaches the minimum smoothly, oscillates, or even diverges.

Note

When tuning momentum parameters, remember that increasing the momentum coefficient ( $β$ ) can accelerate convergence, but may also cause persistent oscillations or even instability if set too high. Practical values for $β$ often range from 0.8 to 0.99. Start with moderate values (like 0.9), observe the behavior, and adjust based on whether you see excessive oscillation or sluggish progress. Always monitor both loss curves and parameter trajectories to find the right balance.


              123456789101112131415161718192021222324252627282930313233
            
import numpy as np
import matplotlib.pyplot as plt

# Quadratic loss: f(x) = 0.5 * x^2
def grad(x):
    return x

def momentum_optimize(beta, alpha, x0, steps):
    x = x0
    v = 0
    trajectory = [x]
    for _ in range(steps):
        v = beta * v - alpha * grad(x)
        x = x + v
        trajectory.append(x)
    return np.array(trajectory)

alph = 0.1
x_start = 5
steps = 30

betas = [0.0, 0.7, 0.9, 0.99]
plt.figure(figsize=(8, 5))
for beta in betas:
    traj = momentum_optimize(beta, alph, x_start, steps)
    plt.plot(traj, label=f"beta={beta}")

plt.axhline(0, color='gray', linestyle='--')
plt.title("Momentum: Oscillatory vs. Smooth Convergence")
plt.xlabel("Step")
plt.ylabel("x value")
plt.legend()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how changing the value of beta affects the optimizer's behavior in this example?

What would happen if we set beta to a value greater than 1?

Can you clarify the relationship between inertia and damping in the context of this code?

Awesome!

Completion rate improved to 5.56

Swipe to show menu

v_{t+1} = \beta v_t - \alpha \nabla f(x_t) \\ x_{t+1} = x_t + v_{t+1}

Note


              123456789101112131415161718192021222324252627282930313233
            
import numpy as np
import matplotlib.pyplot as plt

# Quadratic loss: f(x) = 0.5 * x^2
def grad(x):
    return x

def momentum_optimize(beta, alpha, x0, steps):
    x = x0
    v = 0
    trajectory = [x]
    for _ in range(steps):
        v = beta * v - alpha * grad(x)
        x = x + v
        trajectory.append(x)
    return np.array(trajectory)

alph = 0.1
x_start = 5
steps = 30

betas = [0.0, 0.7, 0.9, 0.99]
plt.figure(figsize=(8, 5))
for beta in betas:
    traj = momentum_optimize(beta, alph, x_start, steps)
    plt.plot(traj, label=f"beta={beta}")

plt.axhline(0, color='gray', linestyle='--')
plt.title("Momentum: Oscillatory vs. Smooth Convergence")
plt.xlabel("Step")
plt.ylabel("x value")
plt.legend()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 3