Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Interpolation vs Extrapolation | Polynomial Regression
Quizzes & Challenges
Quizzes
Challenges
/
Linear Regression with Python

bookInterpolation vs Extrapolation

In the previous chapter, we noticed that our predictions using different models are getting more diverse at the edges.

Predictions become unreliable once we move outside the range of the training data. Predicting beyond that range is extrapolation, while predicting within it is interpolation.

The Regression does not handle the extrapolation well. It is used for interpolation and can yield absurd predictions when new instances are out of the training set's range.

Confidence Intervals

OLS can also return confidence intervals for the regression line:

lower = model.get_prediction(X_new_tilde).summary_frame(alpha)['mean_ci_lower']
upper = model.get_prediction(X_new_tilde).summary_frame(alpha)['mean_ci_upper']

alpha is the confidence level (typically 0.05). This gives lower and upper bounds for each value in X_new_tilde. You can then plot the regression line together with its confidence interval.

12345678910111213141516171819202122
import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 4 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_tilde = PolynomialFeatures(n).fit_transform(X) # Get X_tilde regression_model = sm.OLS(y, X_tilde).fit() # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80) # 1-d array of new feature values X_new_tilde = PolynomialFeatures(n).fit_transform(X_new.reshape(-1,1)) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_tilde) lower = regression_model.get_prediction(X_new_tilde).summary_frame(0.05)['mean_ci_lower'] # Get lower bound for each point upper = regression_model.get_prediction(X_new_tilde).summary_frame(0.05)['mean_ci_upper'] # get upper bound for each point plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph plt.fill_between(X_new, lower, upper, alpha=0.4) plt.show()
copy

Since we don’t know the true distribution of the target, the regression line is only an approximation. The confidence interval shows where the true line likely lies. The interval widens as we move farther from the training data.

Note
Note

The confidence intervals are built assuming we correctly chose the model (e.g., Simple Linear Regression or Polynomial Regression of degree 4).

If the model is chosen poorly, the confidence interval is unreliable, and so is the line itself. You will learn how to select the best model in the following section.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 5.26

bookInterpolation vs Extrapolation

Swipe to show menu

In the previous chapter, we noticed that our predictions using different models are getting more diverse at the edges.

Predictions become unreliable once we move outside the range of the training data. Predicting beyond that range is extrapolation, while predicting within it is interpolation.

The Regression does not handle the extrapolation well. It is used for interpolation and can yield absurd predictions when new instances are out of the training set's range.

Confidence Intervals

OLS can also return confidence intervals for the regression line:

lower = model.get_prediction(X_new_tilde).summary_frame(alpha)['mean_ci_lower']
upper = model.get_prediction(X_new_tilde).summary_frame(alpha)['mean_ci_upper']

alpha is the confidence level (typically 0.05). This gives lower and upper bounds for each value in X_new_tilde. You can then plot the regression line together with its confidence interval.

12345678910111213141516171819202122
import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.preprocessing import PolynomialFeatures # Import PolynomialFeatures class file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/poly.csv' df = pd.read_csv(file_link) n = 4 # A degree of the polynomial regression X = df[['Feature']] # Assign X as a DataFrame y = df['Target'] # Assign y X_tilde = PolynomialFeatures(n).fit_transform(X) # Get X_tilde regression_model = sm.OLS(y, X_tilde).fit() # Initialize and train the model X_new = np.linspace(-0.1, 1.5, 80) # 1-d array of new feature values X_new_tilde = PolynomialFeatures(n).fit_transform(X_new.reshape(-1,1)) # Transform X_new for predict() method y_pred = regression_model.predict(X_new_tilde) lower = regression_model.get_prediction(X_new_tilde).summary_frame(0.05)['mean_ci_lower'] # Get lower bound for each point upper = regression_model.get_prediction(X_new_tilde).summary_frame(0.05)['mean_ci_upper'] # get upper bound for each point plt.scatter(X, y) # Build a scatterplot plt.plot(X_new, y_pred) # Build a Polynomial Regression graph plt.fill_between(X_new, lower, upper, alpha=0.4) plt.show()
copy

Since we don’t know the true distribution of the target, the regression line is only an approximation. The confidence interval shows where the true line likely lies. The interval widens as we move farther from the training data.

Note
Note

The confidence intervals are built assuming we correctly chose the model (e.g., Simple Linear Regression or Polynomial Regression of degree 4).

If the model is chosen poorly, the confidence interval is unreliable, and so is the line itself. You will learn how to select the best model in the following section.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
some-alt