Learn Loss Surfaces and Gradients | Foundations of Neural Network Optimization

Swipe to show menu

When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss function—a quantitative measure of how well your model predicts outcomes. For a simple case with one input ( $x$ ), one weight ( $w$ ), and a true output ( $y_true$ ), the loss function can be written as:

L(w) = (w \times x - y_{true})^2

This equation defines the loss surface: for every possible value of $w$ , you get a corresponding loss value. The shape of this surface — its peaks, valleys, and flat regions — determines how easy it is to find the minimum loss.

The gradient of the loss with respect to $w$ shows how the loss changes as you adjust $w$ . The gradient is:

\frac{dL}{dw} = 2 \times (w \times x - y_{true}) \times x

This tells you the slope of the loss surface at any point $w$ . During optimization, you use this gradient to update $w$ in the direction that reduces the loss. If the gradient is positive, decreasing $w$ reduces the loss; if negative, increasing $w$ reduces the loss. This process — moving against the gradient — guides the network toward the minimum of the loss surface.


              1234567891011121314151617181920212223242526272829303132333435363738394041
            
import numpy as np
import matplotlib.pyplot as plt

# Simple neural network: one input, one weight, no bias, squared error loss
def loss(w, x, y_true):
    return (w * x - y_true) ** 2

def grad(w, x, y_true):
    return 2 * (w * x - y_true) * x

# Parameters
x = 2.0
y_true = 4.0

# Range of weight values to visualize
w_values = np.linspace(-1, 3, 100)
loss_values = loss(w_values, x, y_true)
grad_values = grad(w_values, x, y_true)

# Plot the loss surface
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(w_values, loss_values, label='Loss L(w)')
plt.xlabel('Weight w')
plt.ylabel('Loss')
plt.title('Loss Surface')
plt.grid(True)
plt.legend()

# Plot the gradient
plt.subplot(1, 2, 2)
plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw')
plt.axhline(0, color='gray', linestyle='--', linewidth=1)
plt.xlabel('Weight w')
plt.ylabel('Gradient')
plt.title('Gradient of Loss w.r.t. w')
plt.grid(True)
plt.legend()

plt.tight_layout()
plt.show()

The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.

The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight value—positive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 1