R-squared
What is R-squared
We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is “good enough” for your dataset.
R-squared solves this by measuring how much of the target’s variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.
The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:
Total Variance
The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics (ȳ is the target's mean):
In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by m−1, giving a total variance of 11.07.
Unexplained Variance
Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace ȳ with predicted values.
Here is an example with visualization:
Now we know everything to calculate the R-squared:
We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.
The R-squared is lower since the model underfits the data a little bit.
R-squared in Python
The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.
R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 5.26
R-squared
Swipe to show menu
What is R-squared
We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is “good enough” for your dataset.
R-squared solves this by measuring how much of the target’s variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.
The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:
Total Variance
The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics (ȳ is the target's mean):
In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by m−1, giving a total variance of 11.07.
Unexplained Variance
Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace ȳ with predicted values.
Here is an example with visualization:
Now we know everything to calculate the R-squared:
We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.
The R-squared is lower since the model underfits the data a little bit.
R-squared in Python
The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.
R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.
Thanks for your feedback!