Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Polynomial Regression | Section
Supervised Learning Essentials

bookBuilding Polynomial Regression

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Here is the rewritten section adapted for Scikit-learn.

Building Transformed Matrix

To create polynomial features, we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, the PolynomialFeatures class from sklearn.preprocessing is much easier and more efficient. It requires a 2-D structure (DataFrame or 2-D array):

from sklearn.preprocessing import PolynomialFeatures

X = df[['Feature']]
# Create the transformer
poly = PolynomialFeatures(degree=2, include_bias=False)
# Transform the data
X_poly = poly.fit_transform(X)

Parameters

The PolynomialFeatures class has several important parameters:

  • degree (default=2): the degree of the polynomial features. For example, if degree=3, it generates .
  • interaction_only (default=False): if True, only interaction features are produced (e.g., ), avoiding terms like .
  • include_bias (default=True): if True, it adds a column of ones (bias column).

Important: since LinearRegression calculates the intercept automatically, we usually set include_bias=False to avoid redundancy.

Building the Polynomial Regression

Once we have the transformed features (X_poly), we can use the standard LinearRegression model.

from sklearn.linear_model import LinearRegression

y = df['Target']

# Initialize and train the model
model = LinearRegression()
model.fit(X_poly, y)

Predicting requires transforming the new data using the same transformer instance before passing it to the model:

# Transform new data
X_new_poly = poly.transform(X_new)
# Predict
y_pred = model.predict(X_new_poly)

Full Example

123456789101112131415161718192021222324252627282930
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Load data df = pd.read_csv(file_link) X = df[['Feature']] y = df['Target'] # 1. Create Polynomial Features n = 2 poly = PolynomialFeatures(degree=n, include_bias=False) X_poly = poly.fit_transform(X) # 2. Train Linear Regression model = LinearRegression() model.fit(X_poly, y) # 3. Predict on new data X_new = np.linspace(-0.1, 1.5, 80).reshape(-1, 1) X_new_poly = poly.transform(X_new) y_pred = model.predict(X_new_poly) # Visualization plt.scatter(X, y, label='Data') plt.plot(X_new, y_pred, color='red', label=f'Degree {n}') plt.legend() plt.show()
copy

Try changing the degree (n) to see how the curve changes. You will notice that higher degrees fit the training data better but might behave erratically outside the rangeβ€”this leads into the next chapter on Overfitting.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 12

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookBuilding Polynomial Regression

Swipe to show menu

Loading File

We load poly.csv and inspect it:

1234
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) print(df.head())
copy

Then visualize the relation:

12345
import matplotlib.pyplot as plt X = df['Feature'] y = df['Target'] plt.scatter(X, y) plt.show()
copy

A straight line fits poorly, so Polynomial Regression is more suitable.

Here is the rewritten section adapted for Scikit-learn.

Building Transformed Matrix

To create polynomial features, we could add squared features manually:

df['Feature_squared'] = df['Feature'] ** 2

But for higher degrees, the PolynomialFeatures class from sklearn.preprocessing is much easier and more efficient. It requires a 2-D structure (DataFrame or 2-D array):

from sklearn.preprocessing import PolynomialFeatures

X = df[['Feature']]
# Create the transformer
poly = PolynomialFeatures(degree=2, include_bias=False)
# Transform the data
X_poly = poly.fit_transform(X)

Parameters

The PolynomialFeatures class has several important parameters:

  • degree (default=2): the degree of the polynomial features. For example, if degree=3, it generates .
  • interaction_only (default=False): if True, only interaction features are produced (e.g., ), avoiding terms like .
  • include_bias (default=True): if True, it adds a column of ones (bias column).

Important: since LinearRegression calculates the intercept automatically, we usually set include_bias=False to avoid redundancy.

Building the Polynomial Regression

Once we have the transformed features (X_poly), we can use the standard LinearRegression model.

from sklearn.linear_model import LinearRegression

y = df['Target']

# Initialize and train the model
model = LinearRegression()
model.fit(X_poly, y)

Predicting requires transforming the new data using the same transformer instance before passing it to the model:

# Transform new data
X_new_poly = poly.transform(X_new)
# Predict
y_pred = model.predict(X_new_poly)

Full Example

123456789101112131415161718192021222324252627282930
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Load data df = pd.read_csv(file_link) X = df[['Feature']] y = df['Target'] # 1. Create Polynomial Features n = 2 poly = PolynomialFeatures(degree=n, include_bias=False) X_poly = poly.fit_transform(X) # 2. Train Linear Regression model = LinearRegression() model.fit(X_poly, y) # 3. Predict on new data X_new = np.linspace(-0.1, 1.5, 80).reshape(-1, 1) X_new_poly = poly.transform(X_new) y_pred = model.predict(X_new_poly) # Visualization plt.scatter(X, y, label='Data') plt.plot(X_new, y_pred, color='red', label=f'Degree {n}') plt.legend() plt.show()
copy

Try changing the degree (n) to see how the curve changes. You will notice that higher degrees fit the training data better but might behave erratically outside the rangeβ€”this leads into the next chapter on Overfitting.

question mark

Consider the following code. In which case will the code run without errors?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 12
some-alt