R-squared
What is R-squared
We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is βgood enoughβ for your dataset.
R-squared solves this by measuring how much of the targetβs variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.
The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:
Total Variance
The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics (yΜ is the target's mean):
In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by mβ1, giving a total variance of 11.07.
Unexplained Variance
Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace yΜ with predicted values.
Here is an example with visualization:
Now we know everything to calculate the R-squared:
We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.
The R-squared is lower since the model underfits the data a little bit.
R-squared in Python
The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.
R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain what a good R-squared value is?
How does R-squared compare to MSE, RMSE, and MAE?
Can you show how to interpret R-squared in real-world scenarios?
Awesome!
Completion rate improved to 5.26
R-squared
Swipe to show menu
What is R-squared
We already covered MSE, RMSE, and MAE. They help compare models, but a single score is hard to judge without context. You may not know whether the value is βgood enoughβ for your dataset.
R-squared solves this by measuring how much of the targetβs variance the model explains. Its value ranges from 0 to 1, making interpretation straightforward.
The problem is we cannot calculate the explained variance right away. But we can calculate unexplained variance, so we'll transform the above equation to:
Total Variance
The total variance is just a target's variance, and we can calculate the target's variance using the sample variance formula from Statistics (yΜ is the target's mean):
In the example, the differences between actual values and the target mean (orange lines) are squared and summed, then divided by mβ1, giving a total variance of 11.07.
Unexplained Variance
Next we compute the variance the model does not explain. If predictions were perfect, all points would lie exactly on the regression line. We compute the same variance formula, but replace yΜ with predicted values.
Here is an example with visualization:
Now we know everything to calculate the R-squared:
We got an R-squared score of 0.92 which is close to 1, so we have a great model. We'll also calculate R-squared for one more model.
The R-squared is lower since the model underfits the data a little bit.
R-squared in Python
The sm.OLS class calculates the R-square for us. We can find it in the summary() table here.
R-squared ranges from 0 to 1, and higher is better (unless the model overfits). The summary() output of sm.OLS includes the R-squared score.
Thanks for your feedback!