Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Loss Surfaces and Gradients | Foundations of Neural Network Optimization
Optimization and Regularization in Neural Networks with Python

bookLoss Surfaces and Gradients

When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss functionβ€”a quantitative measure of how well your model predicts outcomes. For a simple case with one input (xx), one weight (ww), and a true output (ytruey_true), the loss function can be written as:

L(w)=(wΓ—xβˆ’ytrue)2L(w) = (w \times x - y_{true})^2

This equation defines the loss surface: for every possible value of ww, you get a corresponding loss value. The shape of this surface β€” its peaks, valleys, and flat regions β€” determines how easy it is to find the minimum loss.

The gradient of the loss with respect to ww shows how the loss changes as you adjust ww. The gradient is:

dLdw=2Γ—(wΓ—xβˆ’ytrue)Γ—x\frac{dL}{dw} = 2 \times (w \times x - y_{true}) \times x

This tells you the slope of the loss surface at any point ww. During optimization, you use this gradient to update ww in the direction that reduces the loss. If the gradient is positive, decreasing ww reduces the loss; if negative, increasing ww reduces the loss. This process β€” moving against the gradient β€” guides the network toward the minimum of the loss surface.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
copy

The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.

The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight valueβ€”positive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.

question mark

Which statement best describes the role of gradients in optimizing neural networks?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how gradient descent updates the weight in this example?

What happens if we change the value of x or y_true in the code?

How do these concepts extend to neural networks with more weights and layers?

bookLoss Surfaces and Gradients

Swipe to show menu

When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss functionβ€”a quantitative measure of how well your model predicts outcomes. For a simple case with one input (xx), one weight (ww), and a true output (ytruey_true), the loss function can be written as:

L(w)=(wΓ—xβˆ’ytrue)2L(w) = (w \times x - y_{true})^2

This equation defines the loss surface: for every possible value of ww, you get a corresponding loss value. The shape of this surface β€” its peaks, valleys, and flat regions β€” determines how easy it is to find the minimum loss.

The gradient of the loss with respect to ww shows how the loss changes as you adjust ww. The gradient is:

dLdw=2Γ—(wΓ—xβˆ’ytrue)Γ—x\frac{dL}{dw} = 2 \times (w \times x - y_{true}) \times x

This tells you the slope of the loss surface at any point ww. During optimization, you use this gradient to update ww in the direction that reduces the loss. If the gradient is positive, decreasing ww reduces the loss; if negative, increasing ww reduces the loss. This process β€” moving against the gradient β€” guides the network toward the minimum of the loss surface.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
copy

The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.

The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight valueβ€”positive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.

question mark

Which statement best describes the role of gradients in optimizing neural networks?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 1
some-alt