Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Metrics | Choosing The Best Model
Linear Regression with Python

bookMetrics

When building a model, we need a metric that measures how well it fits the data. A metric gives a numerical score describing model performance. In this chapter, we focus on the most common ones.

We will use the following notation:

We are already familiar with one metric, SSR (Sum of Squared Residuals), which we minimized to identify the optimal parameters.
Using our notation, we can express the formula for SSR as follows:

or equally:

This metric worked only when models used the same number of data points. It does not show how well a model truly performs. Imagine two models trained on datasets of different sizes.

The first model fits better visually yet has a higher SSR because it has more points, so the sum increases even with smaller average residuals. Using the average of squared residuals fixes this β€” the Mean Squared Error (MSE).

MSE

or equally:

Compute MSE using NumPy:

mse = np.mean((y_true - y_pred)**2)

Or Scikit-learn:

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)

MSE is squared, which makes interpretation harder. If MSE is 49 dollarsΒ², we want the error in dollars. Taking the root gives 7 β€” the Root Mean Squared Error (RMSE).

RMSE

Compute RMSE using:

rmse = np.sqrt(np.mean((y_true - y_pred)**2))

Or Scikit-learn:

rmse = mean_squared_error(y_true, y_pred, squared=False)

MAE

Instead of squaring residuals, we can take their absolute values β€” this gives Mean Absolute Error (MAE).

or equally

MAE behaves like MSE but treats large errors more gently. Because it uses absolute values, it is more robust to outliers, making it useful when extreme values skew the dataset.

Compute MAE:

mae = np.mean(np.fabs(y_true - y_pred))

Or:

from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)

SSR helped us derive the Normal Equation, but any metric can be used when comparing models.

Note
Note

SSR, MSE, and RMSE always rank models the same, while MAE may prefer a different one because it penalizes large errors less. You should pick a metric beforehand and optimize specifically for it.

Now you can surely tell that the second model is better since all its metrics are lower. However, lower metrics do not always mean the model is better.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookMetrics

Swipe to show menu

When building a model, we need a metric that measures how well it fits the data. A metric gives a numerical score describing model performance. In this chapter, we focus on the most common ones.

We will use the following notation:

We are already familiar with one metric, SSR (Sum of Squared Residuals), which we minimized to identify the optimal parameters.
Using our notation, we can express the formula for SSR as follows:

or equally:

This metric worked only when models used the same number of data points. It does not show how well a model truly performs. Imagine two models trained on datasets of different sizes.

The first model fits better visually yet has a higher SSR because it has more points, so the sum increases even with smaller average residuals. Using the average of squared residuals fixes this β€” the Mean Squared Error (MSE).

MSE

or equally:

Compute MSE using NumPy:

mse = np.mean((y_true - y_pred)**2)

Or Scikit-learn:

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)

MSE is squared, which makes interpretation harder. If MSE is 49 dollarsΒ², we want the error in dollars. Taking the root gives 7 β€” the Root Mean Squared Error (RMSE).

RMSE

Compute RMSE using:

rmse = np.sqrt(np.mean((y_true - y_pred)**2))

Or Scikit-learn:

rmse = mean_squared_error(y_true, y_pred, squared=False)

MAE

Instead of squaring residuals, we can take their absolute values β€” this gives Mean Absolute Error (MAE).

or equally

MAE behaves like MSE but treats large errors more gently. Because it uses absolute values, it is more robust to outliers, making it useful when extreme values skew the dataset.

Compute MAE:

mae = np.mean(np.fabs(y_true - y_pred))

Or:

from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)

SSR helped us derive the Normal Equation, but any metric can be used when comparing models.

Note
Note

SSR, MSE, and RMSE always rank models the same, while MAE may prefer a different one because it penalizes large errors less. You should pick a metric beforehand and optimize specifically for it.

Now you can surely tell that the second model is better since all its metrics are lower. However, lower metrics do not always mean the model is better.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
some-alt