Building Polynomial Regression
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Building X̃ Matrix
To create X̃, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)
It also adds the constant column, so no sm.add_constant() needed.
If X is 1-D, convert it:
X = X.reshape(-1, 1)
Building the Polynomial Regression
import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()
Predicting requires transforming new data the same way:
X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)
Full Example
123456789101112131415161718import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
Try different n values to see how the curve changes and how predictions behave outside the original feature range—this leads into the next chapter.
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme
Mahtavaa!
Completion arvosana parantunut arvoon 6.67
Building Polynomial Regression
Pyyhkäise näyttääksesi valikon
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Building X̃ Matrix
To create X̃, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)
It also adds the constant column, so no sm.add_constant() needed.
If X is 1-D, convert it:
X = X.reshape(-1, 1)
Building the Polynomial Regression
import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()
Predicting requires transforming new data the same way:
X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)
Full Example
123456789101112131415161718import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
Try different n values to see how the curve changes and how predictions behave outside the original feature range—this leads into the next chapter.
Kiitos palautteestasi!