Building Polynomial Regression
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Here is the rewritten section adapted for Scikit-learn.
Building Transformed Matrix
To create polynomial features, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, the PolynomialFeatures class from sklearn.preprocessing is much easier and more efficient. It requires a 2-D structure (DataFrame or 2-D array):
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
# Create the transformer
poly = PolynomialFeatures(degree=2, include_bias=False)
# Transform the data
X_poly = poly.fit_transform(X)
Parameters
The PolynomialFeatures class has several important parameters:
degree(default=2): the degree of the polynomial features. For example, ifdegree=3, it generates .interaction_only(default=False): ifTrue, only interaction features are produced (e.g., ), avoiding terms like .include_bias(default=True): ifTrue, it adds a column of ones (bias column).
Important: since LinearRegression calculates the intercept automatically, we usually set include_bias=False to avoid redundancy.
Building the Polynomial Regression
Once we have the transformed features (X_poly), we can use the standard LinearRegression model.
from sklearn.linear_model import LinearRegression
y = df['Target']
# Initialize and train the model
model = LinearRegression()
model.fit(X_poly, y)
Predicting requires transforming the new data using the same transformer instance before passing it to the model:
# Transform new data
X_new_poly = poly.transform(X_new)
# Predict
y_pred = model.predict(X_new_poly)
Full Example
123456789101112131415161718192021222324252627282930import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Load data df = pd.read_csv(file_link) X = df[['Feature']] y = df['Target'] # 1. Create Polynomial Features n = 2 poly = PolynomialFeatures(degree=n, include_bias=False) X_poly = poly.fit_transform(X) # 2. Train Linear Regression model = LinearRegression() model.fit(X_poly, y) # 3. Predict on new data X_new = np.linspace(-0.1, 1.5, 80).reshape(-1, 1) X_new_poly = poly.transform(X_new) y_pred = model.predict(X_new_poly) # Visualization plt.scatter(X, y, label='Data') plt.plot(X_new, y_pred, color='red', label=f'Degree {n}') plt.legend() plt.show()
Try changing the degree (n) to see how the curve changes. You will notice that higher degrees fit the training data better but might behave erratically outside the rangeβthis leads into the next chapter on Overfitting.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.33
Building Polynomial Regression
Swipe to show menu
Loading File
We load poly.csv and inspect it:
1234import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
Then visualize the relation:
12345import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
A straight line fits poorly, so Polynomial Regression is more suitable.
Here is the rewritten section adapted for Scikit-learn.
Building Transformed Matrix
To create polynomial features, we could add squared features manually:
df['Feature_squared'] = df['Feature'] ** 2
But for higher degrees, the PolynomialFeatures class from sklearn.preprocessing is much easier and more efficient. It requires a 2-D structure (DataFrame or 2-D array):
from sklearn.preprocessing import PolynomialFeatures
X = df[['Feature']]
# Create the transformer
poly = PolynomialFeatures(degree=2, include_bias=False)
# Transform the data
X_poly = poly.fit_transform(X)
Parameters
The PolynomialFeatures class has several important parameters:
degree(default=2): the degree of the polynomial features. For example, ifdegree=3, it generates .interaction_only(default=False): ifTrue, only interaction features are produced (e.g., ), avoiding terms like .include_bias(default=True): ifTrue, it adds a column of ones (bias column).
Important: since LinearRegression calculates the intercept automatically, we usually set include_bias=False to avoid redundancy.
Building the Polynomial Regression
Once we have the transformed features (X_poly), we can use the standard LinearRegression model.
from sklearn.linear_model import LinearRegression
y = df['Target']
# Initialize and train the model
model = LinearRegression()
model.fit(X_poly, y)
Predicting requires transforming the new data using the same transformer instance before passing it to the model:
# Transform new data
X_new_poly = poly.transform(X_new)
# Predict
y_pred = model.predict(X_new_poly)
Full Example
123456789101112131415161718192021222324252627282930import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Load data df = pd.read_csv(file_link) X = df[['Feature']] y = df['Target'] # 1. Create Polynomial Features n = 2 poly = PolynomialFeatures(degree=n, include_bias=False) X_poly = poly.fit_transform(X) # 2. Train Linear Regression model = LinearRegression() model.fit(X_poly, y) # 3. Predict on new data X_new = np.linspace(-0.1, 1.5, 80).reshape(-1, 1) X_new_poly = poly.transform(X_new) y_pred = model.predict(X_new_poly) # Visualization plt.scatter(X, y, label='Data') plt.plot(X_new, y_pred, color='red', label=f'Degree {n}') plt.legend() plt.show()
Try changing the degree (n) to see how the curve changes. You will notice that higher degrees fit the training data better but might behave erratically outside the rangeβthis leads into the next chapter on Overfitting.
Thanks for your feedback!