Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Building Polynomial Regression | Section
Practice
Projects
Quizzes & Challenges
Visat
Challenges
/
Regression with Python

bookBuilding Polynomial Regression

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Building X̃ Matrix

To create , we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:

from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)

It also adds the constant column, so no sm.add_constant() needed.

If X is 1-D, convert it:

X = X.reshape(-1, 1)

Building the Polynomial Regression

import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()

Predicting requires transforming new data the same way:

X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)

Full Example

123456789101112131415161718
import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
copy

Try different n values to see how the curve changes and how predictions behave outside the original feature range—this leads into the next chapter.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 13

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

bookBuilding Polynomial Regression

Pyyhkäise näyttääksesi valikon

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Building X̃ Matrix

To create , we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:

from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)

It also adds the constant column, so no sm.add_constant() needed.

If X is 1-D, convert it:

X = X.reshape(-1, 1)

Building the Polynomial Regression

import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()

Predicting requires transforming new data the same way:

X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)

Full Example

123456789101112131415161718
import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
copy

Try different n values to see how the curve changes and how predictions behave outside the original feature range—this leads into the next chapter.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 13
some-alt