Building Polynomial Regression
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Building XΜ Matrix
To create XΜ, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)
It also adds the constant column, so no sm.add_constant() needed.
If X is 1-D, convert it:
X = X.reshape(-1, 1)
Building the Polynomial Regression
import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()
Predicting requires transforming new data the same way:
X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)
Full Example
123456789101112131415161718import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
Try different n values to see how the curve changes and how predictions behave outside the original feature rangeβthis leads into the next chapter.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
What does the `PolynomialFeatures` class do in this context?
How do I choose the best degree `n` for my polynomial regression?
Can you explain why a straight line fits poorly in this example?
Awesome!
Completion rate improved to 5.26
Building Polynomial Regression
Swipe to show menu
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Building XΜ Matrix
To create XΜ, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, PolynomialFeatures is easier. It requires a 2-D structure:
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
poly = PolynomialFeatures(n)
X_tilde = poly.fit_transform(X)
It also adds the constant column, so no sm.add_constant() needed.
If X is 1-D, convert it:
X = X.reshape(-1, 1)
Building the Polynomial Regression
import statsmodels.api as sm
y = df['Target']
X = df[['Feature']]
X_tilde = PolynomialFeatures(n).fit_transform(X)
model = sm.OLS(y, X_tilde).fit()
Predicting requires transforming new data the same way:
X_new_tilde = PolynomialFeatures(n).fit_transform(X_new)
y_pred = model.predict(X_new_tilde)
Full Example
123456789101112131415161718import pandas as pd, numpy as np, matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv(file_link) n = 2 X = df[['Feature']] y = df['Target'] X_tilde = PolynomialFeatures(n).fit_transform(X) model = sm.OLS(y, X_tilde).fit() X_new = np.linspace(-0.1, 1.5, 80).reshape(-1,1) X_new_tilde = PolynomialFeatures(n).fit_transform(X_new) y_pred = model.predict(X_new_tilde) plt.scatter(X, y) plt.plot(X_new, y_pred) plt.show()
Try different n values to see how the curve changes and how predictions behave outside the original feature rangeβthis leads into the next chapter.
Thanks for your feedback!