Loss Surfaces and Gradients
When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss function—a quantitative measure of how well your model predicts outcomes. For a simple case with one input (x), one weight (w), and a true output (ytrue), the loss function can be written as:
L(w)=(w×x−ytrue)2This equation defines the loss surface: for every possible value of w, you get a corresponding loss value. The shape of this surface — its peaks, valleys, and flat regions — determines how easy it is to find the minimum loss.
The gradient of the loss with respect to w shows how the loss changes as you adjust w. The gradient is:
dwdL=2×(w×x−ytrue)×xThis tells you the slope of the loss surface at any point w. During optimization, you use this gradient to update w in the direction that reduces the loss. If the gradient is positive, decreasing w reduces the loss; if negative, increasing w reduces the loss. This process — moving against the gradient — guides the network toward the minimum of the loss surface.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.
The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight value—positive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Génial!
Completion taux amélioré à 8.33
Loss Surfaces and Gradients
Glissez pour afficher le menu
When you train a neural network, you are searching for parameters (weights and biases) that minimize a loss function—a quantitative measure of how well your model predicts outcomes. For a simple case with one input (x), one weight (w), and a true output (ytrue), the loss function can be written as:
L(w)=(w×x−ytrue)2This equation defines the loss surface: for every possible value of w, you get a corresponding loss value. The shape of this surface — its peaks, valleys, and flat regions — determines how easy it is to find the minimum loss.
The gradient of the loss with respect to w shows how the loss changes as you adjust w. The gradient is:
dwdL=2×(w×x−ytrue)×xThis tells you the slope of the loss surface at any point w. During optimization, you use this gradient to update w in the direction that reduces the loss. If the gradient is positive, decreasing w reduces the loss; if negative, increasing w reduces the loss. This process — moving against the gradient — guides the network toward the minimum of the loss surface.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt # Simple neural network: one input, one weight, no bias, squared error loss def loss(w, x, y_true): return (w * x - y_true) ** 2 def grad(w, x, y_true): return 2 * (w * x - y_true) * x # Parameters x = 2.0 y_true = 4.0 # Range of weight values to visualize w_values = np.linspace(-1, 3, 100) loss_values = loss(w_values, x, y_true) grad_values = grad(w_values, x, y_true) # Plot the loss surface plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.plot(w_values, loss_values, label='Loss L(w)') plt.xlabel('Weight w') plt.ylabel('Loss') plt.title('Loss Surface') plt.grid(True) plt.legend() # Plot the gradient plt.subplot(1, 2, 2) plt.plot(w_values, grad_values, color='orange', label='Gradient dL/dw') plt.axhline(0, color='gray', linestyle='--', linewidth=1) plt.xlabel('Weight w') plt.ylabel('Gradient') plt.title('Gradient of Loss w.r.t. w') plt.grid(True) plt.legend() plt.tight_layout() plt.show()
The code sample visualizes both the loss surface and its gradient for a simple neural network with one weight. Using numpy, you calculate arrays of loss and gradient values for a range of weight values. The loss is computed as the squared difference between the predicted output (w * x) and the true value (y_true), forming a parabolic curve when plotted. Matplotlib is used to display two plots side by side: one for the loss surface and one for the gradient.
The loss surface plot shows how the loss changes as you vary the weight, revealing a clear minimum point. The gradient plot displays the slope of the loss at each weight value—positive above the minimum, negative below it, and zero exactly at the minimum. This slope tells you how to adjust the weight to reduce the loss. Gradient descent uses this information, updating the weight in the direction that lowers the loss, always moving toward the bottom of the parabola where the loss is minimized.
Merci pour vos commentaires !