Loss Surfaces and Gradients
When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss functionβa quantitative measure of how well your model predicts outcomes. For a simple case with one input (x), one weight (w), and a true output (ytβrue), the loss function can be written as:
L(w)=(wΓxβytrueβ)2This equation defines the loss surface: for every possible value of w, you get a corresponding loss value. The shape of this surface β its peaks, valleys, and flat regions β determines how easy it is to find the minimum loss.
The gradient of the loss with respect to w shows how the loss changes as you adjust w. The gradient is:
dwdLβ=2Γ(wΓxβytrueβ)ΓxThis tells you the slope of the loss surface at any point w. During optimization, you use this gradient to update w in the direction that reduces the loss. If the gradient is positive, decreasing w reduces the loss; if negative, increasing w reduces the loss. This process β moving against the gradient β guides the network toward the minimum of the loss surface.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.
The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight valueβpositive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how gradient descent updates the weight in this example?
What happens if we change the value of x or y_true in the code?
How do these concepts extend to neural networks with more weights and layers?
Awesome!
Completion rate improved to 8.33
Loss Surfaces and Gradients
Swipe to show menu
When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss functionβa quantitative measure of how well your model predicts outcomes. For a simple case with one input (x), one weight (w), and a true output (ytβrue), the loss function can be written as:
L(w)=(wΓxβytrueβ)2This equation defines the loss surface: for every possible value of w, you get a corresponding loss value. The shape of this surface β its peaks, valleys, and flat regions β determines how easy it is to find the minimum loss.
The gradient of the loss with respect to w shows how the loss changes as you adjust w. The gradient is:
dwdLβ=2Γ(wΓxβytrueβ)ΓxThis tells you the slope of the loss surface at any point w. During optimization, you use this gradient to update w in the direction that reduces the loss. If the gradient is positive, decreasing w reduces the loss; if negative, increasing w reduces the loss. This process β moving against the gradient β guides the network toward the minimum of the loss surface.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.
The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight valueβpositive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.
Thanks for your feedback!