Lära Huber Loss: Combining MSE and MAE | Regression Loss Functions

The Huber loss offers a unique blend of the Mean Squared Error (MSE) and Mean Absolute Error (MAE), providing both the smoothness of $MSE$ and the robustness of $MAE$ . The formula for the Huber loss is defined piecewise, allowing it to switch between a quadratic and a linear penalty depending on the size of the prediction error. The Huber loss for a single prediction is given by:

L_\delta(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\ \delta \cdot (|y - \hat{y}| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}

Here, $y$ is the true value, $\hat{y}$ is the predicted value, and $\delta$ is a positive threshold parameter.


              12345678910111213141516171819202122
            
import numpy as np
import matplotlib.pyplot as plt

errors = np.linspace(-5, 5, 500)
delta = 1

mse = 0.5 * errors**2
mae = np.abs(errors)
huber = np.where(
    np.abs(errors) <= delta,
    0.5 * errors**2,
    delta * (np.abs(errors) - 0.5 * delta)
)

plt.plot(errors, mse, label="MSE")
plt.plot(errors, mae, label="MAE")
plt.plot(errors, huber, label="Huber (δ=1)")
plt.title("Huber Loss Compared to MSE and MAE")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()

Note

Huber loss behaves like $MSE$ for small errors, promoting smooth optimization and sensitivity to small deviations. For large errors, it switches to $MAE$ behavior, reducing the influence of outliers and providing robustness. This balance makes Huber loss especially useful in datasets with occasional large errors or outliers.

The transition parameter $\delta$ is central to how the Huber loss functions. When the absolute error is less than or equal to $\delta$ , the loss is quadratic, just like MSE. This means small errors are penalized more strongly, encouraging precise predictions. When the error exceeds $\delta$ , the loss becomes linear, similar to MAE, which prevents large errors from having an outsized impact on the optimization process. By tuning $\delta$ , you can control the trade-off between sensitivity to small errors and robustness to outliers. A smaller $\delta$ makes the loss function more like MAE, while a larger $\delta$ makes it more like MSE.


              123456789101112131415
            
plt.figure(figsize=(8,5))

for delta in [0.5, 1, 2]:
    huber = np.where(
        np.abs(errors) <= delta,
        0.5 * errors**2,
        delta * (np.abs(errors) - 0.5 * delta)
    )
    plt.plot(errors, huber, label=f"δ = {delta}")

plt.title("Effect of Delta (δ) on Huber Loss")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()

Var allt tydligt?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 3

Fråga AI

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain when to use Huber loss instead of MSE or MAE?

How do I choose the best value for delta (δ) in practice?

What are some real-world examples where Huber loss is especially useful?

Svep för att visa menyn

L_\delta(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\ \delta \cdot (|y - \hat{y}| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}

Here, $y$ is the true value, $\hat{y}$ is the predicted value, and $\delta$ is a positive threshold parameter.


              12345678910111213141516171819202122
            
import numpy as np
import matplotlib.pyplot as plt

errors = np.linspace(-5, 5, 500)
delta = 1

mse = 0.5 * errors**2
mae = np.abs(errors)
huber = np.where(
    np.abs(errors) <= delta,
    0.5 * errors**2,
    delta * (np.abs(errors) - 0.5 * delta)
)

plt.plot(errors, mse, label="MSE")
plt.plot(errors, mae, label="MAE")
plt.plot(errors, huber, label="Huber (δ=1)")
plt.title("Huber Loss Compared to MSE and MAE")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()

Note


              123456789101112131415
            
plt.figure(figsize=(8,5))

for delta in [0.5, 1, 2]:
    huber = np.where(
        np.abs(errors) <= delta,
        0.5 * errors**2,
        delta * (np.abs(errors) - 0.5 * delta)
    )
    plt.plot(errors, huber, label=f"δ = {delta}")

plt.title("Effect of Delta (δ) on Huber Loss")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()

Var allt tydligt?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 3