Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen R-squared | Metrics to Evaluate the Model
Explore the Linear Regression Using Python

book
R-squared

Analyzing past metrics, it becomes obvious that even if we calculate them, it is not entirely clear how good or bad they are. Here I want to introduce a new metric that will be universal for each task. This metric is R-squared or is also called the coefficient of determination. It is used to test how well the observed results are reproduced by the model. The metric is calculated using the following formula:

Where:

  • RSS = the sum of squares of residuals

  • TSS = the total sum of squares

Let’s see R-squared for our data:

RSS = (residuals**2).sum()
TSS = ((Y_test - Y_test.mean())**2).sum()
r2 = 1 -RSS/TSS
print(r2)
1234
RSS = (residuals**2).sum() TSS = ((Y_test - Y_test.mean())**2).sum() r2 = 1 -RSS/TSS print(r2)
copy
python
Output:
0.532141840245738

We can say that 53% of the variability of the dependent output feature can be explained by our model, but the remaining 47% is still not taken into account.

The best possible estimate for the coefficient of determination is 1, which is obtained when the predicted values coincide with the actual values, that is, the residuals are zero and, accordingly, RSS. If R-squared is 0, it means that the model does not explain any of the variations in the response variable around its mean. It happens that R-squared becomes negative, you should not be afraid. This happens when we have a prediction that is not the most successful, that is, the model did not learn very well. When we subtract the truth from these predictions, we will get large deviations and, as a result, a large negative value at the end when subtracting from 1.

We can get R-squared using the .score() method:

print(model.score(X_test, Y_test))
1
print(model.score(X_test, Y_test))
copy

The higher the R-squared, the better the model fits our data.

You can also use the way from previous chapters:

from sklearn.metrics import r2_score
print(r2_score(Y_test, y_test_predicted))
12
from sklearn.metrics import r2_score print(r2_score(Y_test, y_test_predicted))
copy
Aufgabe

Swipe to start coding

Let’s calculate the R-squared using our data. You should find it using model’s method and also using built metrics function.

  1. [Line #31] Find the result of applying model’s method to your data and assign the value to the variable r_squared_model.
  2. [Line #32] Print r_squared_model.
  3. [Line #34] Import r2_score for calculating metrics.
  4. [Line #35] Use the method r2_score() to find R-squared and assign it to the variable r_squared_model.
  5. [Line #36] Print r_squared_model.

Lösung

# Import the libraries
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load the dataset
wine = load_wine()

# Configure pandas to show all features
pd.set_option('display.max_rows', None, 'display.max_columns', None)

# Define the DataFrame
data = pd.DataFrame(data = wine['data'], columns = wine['feature_names'])

# Define the target
data['total_phenols'] = wine.target

# Define the data we will work with
x = data[['flavanoids']]
y = data['total_phenols']

# Build and fit the model
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LinearRegression()
model.fit(X_train, Y_train)

y_test_predicted = model.predict(X_test)

# Calculate R-squared
r_squared_model = model.score(X_test, Y_test)
print(r_squared_model)

from sklearn.metrics import r2_score
r_squared_model_func = r2_score(Y_test, y_test_predicted)
print(r_squared_model_func)

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 4. Kapitel 4
# Import the libraries
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load the dataset
wine = load_wine()

# Configure pandas to show all features
pd.set_option('display.max_rows', None, 'display.max_columns', None)

# Define the DataFrame
data = pd.DataFrame(data = wine['data'], columns = wine['feature_names'])

# Define the target
data['total_phenols'] = wine.target

# Define the data we will work with
x = data[['flavanoids']]
y = data['total_phenols']

# Build and fit the model
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
model = LinearRegression()
model.fit(X_train, Y_train)

y_test_predicted = model.predict(X_test)

# Calculate R-squared
r_squared_model = ___
___

from ___ import ___
r_squared_model_func = ___
___

Fragen Sie AI

expand
ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

We use cookies to make your experience better!
some-alt