Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn R-squared | Choosing The Best Model
Linear Regression with Python

bookR-squared

What is R-squared

We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is “good enough” for your dataset.

R-squared solves this by measuring how much of the target’s variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.

The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:

Total Variance

The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics ( is the target's mean):

In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by m−1, giving a total variance of 11.07.

Unexplained Variance

Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace with predicted values.

Here is an example with visualization:

Now we know everything to calculate the R-squared:

We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.

The R-squared is lower since the model underfits the data a little bit.

R-squared in Python

The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.

R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.26

bookR-squared

Swipe to show menu

What is R-squared

We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is “good enough” for your dataset.

R-squared solves this by measuring how much of the target’s variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.

The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:

Total Variance

The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics ( is the target's mean):

In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by m−1, giving a total variance of 11.07.

Unexplained Variance

Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace with predicted values.

Here is an example with visualization:

Now we know everything to calculate the R-squared:

We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.

The R-squared is lower since the model underfits the data a little bit.

R-squared in Python

The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.

R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 3
some-alt