Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Polynomial Regression | Polynomial Regression
Linear Regression with Python

bookBuilding Polynomial Regression

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Building X̃ Matrix

To create X̃, we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:

from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)

It also adds the constant column, so no sm.add_constant() needed.

If X is 1-D, convert it:

X = X.reshape(-1, 1)

Building the Polynomial Regression

import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()

Predicting requires transforming new data the same way:

X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)

Full Example

123456789101112131415161718
import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
copy

Try different n values to see how the curve changes and how predictions behave outside the original feature rangeβ€”this leads into the next chapter.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

What does the `PolynomialFeatures` class do in this context?

How do I choose the best degree `n` for my polynomial regression?

Can you explain why a straight line fits poorly in this example?

Awesome!

Completion rate improved to 5.26

bookBuilding Polynomial Regression

Swipe to show menu

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Building X̃ Matrix

To create X̃, we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:

from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)

It also adds the constant column, so no sm.add_constant() needed.

If X is 1-D, convert it:

X = X.reshape(-1, 1)

Building the Polynomial Regression

import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()

Predicting requires transforming new data the same way:

X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)

Full Example

123456789101112131415161718
import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
copy

Try different n values to see how the curve changes and how predictions behave outside the original feature rangeβ€”this leads into the next chapter.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3
some-alt