Lernen Mean Absolute Error (MAE): Robustness and Median Connection

When choosing a loss function for regression tasks, you often encounter both Mean Absolute Error (MAE) and Mean Squared Error (MSE). The MAE is defined as the average of the absolute differences between true values ( $y$ ) and predicted values ( $ŷ$ ). Its mathematical formula is:

L_{MAE}(y, \hat{y}) = |y - \hat{y}|


              1234567891011121314
            
import numpy as np
import matplotlib.pyplot as plt

errors = np.linspace(-4, 4, 400)
mae = np.abs(errors)
mse = errors**2

plt.plot(errors, mae, label="MAE")
plt.plot(errors, mse, label="MSE")
plt.title("MAE vs MSE Loss Functions")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()

Unlike MSE, which squares the error, MAE simply takes the absolute value. This difference has important consequences for how each loss function responds to large errors. While MSE penalizes large errors more heavily due to the squaring, MAE treats all errors in direct proportion to their magnitude. This means that the influence of any single, very large error is much less pronounced with MAE than with MSE.


              12345
            
import numpy as np

errors = np.array([1, 2, 3, 20])   # outlier = 20
print("MAE:", np.mean(np.abs(errors)))
print("MSE:", np.mean(errors**2))

Note

MAE is less sensitive to outliers than MSE, making it a robust choice when your data contains extreme values or follows a heavy-tailed distribution. This robustness helps prevent a few large errors from dominating the loss and distorting your model's learning process.

Mathematically, the connection between MAE and the median emerges when you try to find the constant value that minimizes the MAE for a set of data points. If you have a set of observed values and you want to choose a single value that minimizes the sum of absolute differences to all points, the optimal choice is the median of the data. This is because the median splits the data such that half the points are above and half below, minimizing the total absolute deviation. In contrast, minimizing MSE leads to the mean as the optimal estimator. Therefore, using MAE as a loss function encourages your model's predictions to align with the median of the target distribution, rather than the mean.


              123456789101112
            
import numpy as np

data = np.array([1, 2, 5, 8, 50])  # outlier = 50

mean = np.mean(data)
median = np.median(data)

mae_mean = np.sum(np.abs(data - mean))
mae_median = np.sum(np.abs(data - median))

print("Mean:", mean, "| Total ABS deviation:", mae_mean)
print("Median:", median, "| Total ABS deviation:", mae_median)

War alles klar?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 2

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Swipe um das Menü anzuzeigen

L_{MAE}(y, \hat{y}) = |y - \hat{y}|


              1234567891011121314
            
import numpy as np
import matplotlib.pyplot as plt

errors = np.linspace(-4, 4, 400)
mae = np.abs(errors)
mse = errors**2

plt.plot(errors, mae, label="MAE")
plt.plot(errors, mse, label="MSE")
plt.title("MAE vs MSE Loss Functions")
plt.xlabel("Error (y - ŷ)")
plt.ylabel("Loss")
plt.legend()
plt.show()


              12345
            
import numpy as np

errors = np.array([1, 2, 3, 20])   # outlier = 20
print("MAE:", np.mean(np.abs(errors)))
print("MSE:", np.mean(errors**2))

Note


              123456789101112
            
import numpy as np

data = np.array([1, 2, 5, 8, 50])  # outlier = 50

mean = np.mean(data)
median = np.median(data)

mae_mean = np.sum(np.abs(data - mean))
mae_median = np.sum(np.abs(data - median))

print("Mean:", mean, "| Total ABS deviation:", mae_mean)
print("Median:", median, "| Total ABS deviation:", mae_median)

War alles klar?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 2