Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Predict Prices Using Polynomial Regression | Choosing The Best Model
Linear Regression with Python

book
Predict Prices Using Polynomial Regression

For this challenge, you will build the same Polynomial Regression of degree 2 as in the previous challenge. However, you will need to split the set into a training set and a test set to calculate RMSE for both those sets. This is required to judge whether the model overfits/underfits or not.
Here is the reminder of the train_test_split() function you'll want to use.

And also reminder of the mean_squared_error() function needed to calculate RMSE:

python
rmse = mean_squared_error(y_true, y_predicted, squared=False)

Now let's move to coding!

Tarefa

Swipe to start coding

  1. Assign the DataFrame with a single column 'age' of df to the X variable.
  2. Preprocess the X using the PolynomialFeatures class.
  3. Split the dataset using the appropriate function from sklearn.
  4. Build and train a model on the training set.
  5. Predict the targets of both training and test set.
  6. Calculate the RMSE for both training and test set.
  7. Print the summary table.

Solução

import pandas as pd
import statsmodels.api as sm
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_poly.csv')
# Assign the variables
X = df[['age']]
y = df['price']
n = 2 # A degree of Polynomial Regression
# Preprocess X
X_tilde = PolynomialFeatures(n).fit_transform(X)
# Split the dataset
X_tilde_train, X_tilde_test, y_train, y_test = train_test_split(X_tilde, y, test_size=0.3, random_state=0)
# Build and train the model using training set
model = sm.OLS(y_train, X_tilde_train).fit()
# Evaluate both sets
y_train_pred = model.predict(X_tilde_train)
y_test_pred = model.predict(X_tilde_test)
# Calculate and print RMSE scores for both training and test sets
train_rmse = mean_squared_error(y_train, y_train_pred, squared=False)
test_rmse = mean_squared_error(y_test, y_test_pred, squared=False)
print('Train RMSE:', train_rmse)
print('Test RMSE:', test_rmse)
# Print the summary
print(model.summary())

When you complete the task, you will notice that the test RMSE is even lower than the training RMSE. Usually, models do not show better results on unseen instances. Here, the difference is tiny and caused by chance. Our dataset is relatively small, and while splitting, the test set received a bit better(easier to predict) data points.

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 4. Capítulo 4
single

single

import pandas as pd
import statsmodels.api as sm
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houses_poly.csv')
# Assign the variables
X = df[['___']]
y = df['price']
n = 2 # A degree of Polynomial Regression
# Preprocess X
X_tilde = ___(___).fit_transform(X)
# Split the dataset
X_tilde_train, X_tilde_test, y_train, y_test = ___(X_tilde, y, test_size=0.3, random_state=0)
# Build and train the model using training set
model = sm.OLS(___, ___).fit()
# Evaluate both sets
y_train_pred = model.___(X_tilde_train)
y_test_pred = model.predict(___)
# Calculate and print RMSE scores for both training and test sets
train_rmse = mean_squared_error(___, ___, squared=False)
test_rmse = mean_squared_error(y_test, ___, ___=False)
print('Train RMSE:', train_rmse)
print('Test RMSE:', test_rmse)
# Print the summary
print(___.summary())

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt