Conteúdo do Curso

Linear Regression with Python

1. Simple Linear Regression

What is Linear Regression Finding the Parameters Building Linear Regression Using NumPy Building Linear Regression Using Statsmodels Predict House Prices

2. Multiple Linear Regression

Linear Regression with Two Features Linear Regression with n Features Building Multiple Linear Regression Choosing the Features Predict Prices Using Two Features

3. Polynomial Regression

Quadratic Regression Polynomial Regression Building Polynomial Regression Interpolation vs Extrapolation Evaluate the Model

4. Choosing The Best Model

Metrics Overfitting R-squared Predict Prices Using Polynomial Regression

Building Polynomial Regression

Loading file

For this chapter, we have a file named poly.csv. Let's load the file and look at the contents.


              123456
            
import pandas as pd

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)

print(df.head(5))

So here we have one feature and the target. Let's build a scatter plot.


              123456789
            
import pandas as pd
import matplotlib.pyplot as plt

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
X = df['Feature']
y = df['Target']
plt.scatter(X,y)
plt.show()

It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!

Building X̃ matrix

We will once again use the OLS class. Still, we need to create an X̃ matrix. We do it manually by adding a squared Feature column to the DataFrame like this:


python

But if we want to build a high-degree polynomial regression, that will require adding a lot of columns like this. Luckily Scikit-Learn provides a way to do it less painfully using the PolynomialFeatures class.

fit_transform(X) method expects X to be either 2-d array or pandas DataFrame. If your X is an 1-d numpy array, reshape(-1,1) method will transform it to a 2-d array with the same contents:


python

If your X is a column from DataFrame, you can use X = df[['col1']] to get a DataFrame instead of pandas Series, which is not suited for fit_transform():


python

So to build an X̃ for the Polynomial Regression of degree n, we would use:


python

Note

The PolynomialFeatures class also adds a column with 1s, so you do not need to use sm.add_constant().

Building the Polynomial Regression and making the predictions

Knowing how to get an X̃, we are ready to build the Polynomial Regression the same way as the prior models:


python

For predicting new values, X_new should be transformed using PolynomialFeatures too.


python

The following runnable example shows the entire process of building polynomial regression. X_new here is a 1-d array of points between -0.1 and 1.5. They are needed for visualization. And since it is a 1-d array, we should apply reshape(-1,1) method before using it in the PolynomialFeatures class.


              12345678910111213141516171819
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
n = 2   # A degree of the polynomial regression
X = df[['Feature']] # Assign X as a DataFrame
y = df['Target'] # Assign y
X_tilde = PolynomialFeatures(n).fit_transform(X) # Get X_tilde
regression_model = sm.OLS(y, X_tilde).fit() # Initialize and train the model
X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values
X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) # Transform X_new for predict() method
y_pred = regression_model.predict(X_new_tilde)
plt.scatter(X, y)	# Build a scatterplot
plt.plot(X_new, y_pred)	# Build a Polynomial Regression graph
plt.show()

Feel free to play with the values of n in the eighth line. You will see how the plot changes depending on the polynomial regression's degree. If you pay attention, you may notice how different the predictions are for feature values lower than 0, or greater than 1.4. That is the subject of the next chapter.

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 3. Capítulo 3

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo