Building Polynomial Regression
Loading file
For this chapter, we have a file named poly.csv
. Let's load the file and look at the contents.
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head(5))
So here we have one feature and the target. Let's build a scatter plot.
123456789import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) X = df['Feature'] y = df['Target'] plt.scatter(X,y) plt.show()
It is hard to imagine a straight line fitting this data well. So let's build a Polynomial Regression!
Building X̃ matrix
We will once again use the OLS
class. Still, we need to create an X̃ matrix. We do it manually by adding a squared Feature
column to the DataFrame like this:
pythondf['Feature_squared'] = df['Feature'] ** 2
But if we want to build a high-degree polynomial regression, that will require adding a lot of columns like this. Luckily Scikit-Learn provides a way to do it less painfully using the PolynomialFeatures
class.
fit_transform(X)
method expects X to be either 2-d array or pandas DataFrame.
If your X is an 1-d numpy array, reshape(-1,1)
method will transform it to a 2-d array with the same contents:
pythonX = X.reshape(-1,1)
If your X is a column from DataFrame, you can use X = df[['col1']]
to get a DataFrame instead of pandas Series, which is not suited for fit_transform()
:
python912X = df['Feature'] # X is a pandas SeriesX = df[['Feature']] # X is a pandas DataFrame
So to build an X̃ for the Polynomial Regression of degree n
, we would use:
python9123from sklearn.preprocessing import PolynomialFeatures # Import the classpoly = PolynomialFeatures(n) # Initialize a PolynomialFeatures objectX_tilde = poly.fit_transform(X)
Note
The
PolynomialFeatures
class also adds a column with 1s, so you do not need to usesm.add_constant()
.
Building the Polynomial Regression and making the predictions
Knowing how to get an X̃, we are ready to build the Polynomial Regression the same way as the prior models:
python9123456y = df['Target']# Prepare X_tildeX = df[['Feature']]X_tilde = PolynomialFeatures(n).fit_transform(X)# Initialize the OLS object and train itregression_model = sm.OLS(y, X_tilde).fit()
For predicting new values, X_new
should be transformed using PolynomialFeatures
too.
python912X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)y_pred = regression_model.predict(X_new_tilde)
The following runnable example shows the entire process of building polynomial regression. X_new
here is a 1-d array of points between -0.1 and 1.5. They are needed for visualization. And since it is a 1-d array, we should apply reshape(-1,1)
method before using it in the PolynomialFeatures class.
12345678910111213141516171819import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 2 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_tilde = PolynomialFeatures(n).fit_transform(X) # Get X_tilde regression_model = sm.OLS(y, X_tilde).fit() # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) # 2-d array of new feature values X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_tilde) plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph plt.show()
Feel free to play with the values of n
in the eighth line. You will see how the plot changes depending on the polynomial regression's degree. If you pay attention, you may notice how different the predictions are for feature values lower than 0, or greater than 1.4. That is the subject of the next chapter.
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat