Learn Building Polynomial Regression

Swipe to show menu

Loading File

We load poly.csv and inspect it:


              1234
            
import pandas as pd
file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv'
df = pd.read_csv(file_link)
print(df.head())

Then visualize the relation:


              12345
            
import matplotlib.pyplot as plt
X = df['Feature']
y = df['Target']
plt.scatter(X, y)
plt.show()

A straight line fits poorly, so Polynomial Regression is more suitable.

Here is the rewritten section adapted for Scikit-learn.

Building Transformed Matrix

To create polynomial features, we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, the PolynomialFeatures class from sklearn.preprocessing is much easier and more efficient. It requires a 2-D structure (DataFrame or 2-D array):

from sklearn.preprocessing import PolynomialFeatures

X = df[['Feature']]
# Create the transformer
poly = PolynomialFeatures(degree=2, include_bias=False)
# Transform the data
X_poly = poly.fit_transform(X)

Parameters

The PolynomialFeatures class has several important parameters:

degree (default=2): the degree of the polynomial features. For example, if degree=3, it generates .
interaction_only (default=False): if True, only interaction features are produced (e.g., ), avoiding terms like .
include_bias (default=True): if True, it adds a column of ones (bias column).

Important: since LinearRegression calculates the intercept automatically, we usually set include_bias=False to avoid redundancy.

Building the Polynomial Regression

Once we have the transformed features (X_poly), we can use the standard LinearRegression model.

from sklearn.linear_model import LinearRegression

y = df['Target']

# Initialize and train the model
model = LinearRegression()
model.fit(X_poly, y)

Predicting requires transforming the new data using the same transformer instance before passing it to the model:

# Transform new data
X_new_poly = poly.transform(X_new)
# Predict
y_pred = model.predict(X_new_poly)

Full Example


              123456789101112131415161718192021222324252627282930
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Load data
df = pd.read_csv(file_link)
X = df[['Feature']]
y = df['Target']

# 1. Create Polynomial Features
n = 2
poly = PolynomialFeatures(degree=n, include_bias=False)
X_poly = poly.fit_transform(X)

# 2. Train Linear Regression
model = LinearRegression()
model.fit(X_poly, y)

# 3. Predict on new data
X_new = np.linspace(-0.1, 1.5, 80).reshape(-1, 1)
X_new_poly = poly.transform(X_new)
y_pred = model.predict(X_new_poly)

# Visualization
plt.scatter(X, y, label='Data')
plt.plot(X_new, y_pred, color='red', label=f'Degree {n}')
plt.legend()
plt.show()

Try changing the degree (n) to see how the curve changes. You will notice that higher degrees fit the training data better but might behave erratically outside the range—this leads into the next chapter on Overfitting.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 12

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 12