Building Multiple Linear Regression

The OLS class allows you to build Multiple Linear Regression the same way as Simple Linear Regression. But unfortunately, the np.polyfit() function does not handle the multiple features case.

We will stick with the OLS class.

Building X̃ matrix

We have the same dataset from the simple linear regression example, but it now has the mother's height as the second feature. Let's load it and look at its X variable.


              123456789
            
import pandas as pd
import statsmodels.api as sm

file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv'
df = pd.read_csv(file_link)	# Open the file
# Assign the variables
X = df[['Father', 'Mother']]
y = df['Height']
print(X.head())

Remember, we should use OLS(y, X_tilde) to initialize the OLS object. As you can see, the X variable already holds two features in separate columns. So to get the X_tilde, we only need to add 1s as a first column. The sm.add_constant(X) function is doing exactly that!


              1234567891011
            
import pandas as pd
import statsmodels.api as sm

file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv'
df = pd.read_csv(file_link)	# Open the file
# Assign the variables
X = df[['Father', 'Mother']]
y = df['Height']
# Create X_tilde
X_tilde = sm.add_constant(X)
print(X_tilde.head())

Finding the parameters

Great! Now we can build the model, find the parameters and make predictions the same way we did in the previous section.


              12345678910111213141516171819202122
            
import pandas as pd
import statsmodels.api as sm
import numpy as np

file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv'
df = pd.read_csv(file_link)	  # Open the file
X,y = df[['Father', 'Mother']], df['Height']   # Assign the variables
X_tilde = sm.add_constant(X)	# Create X_tilde
# Initialize an OLS object
regression_model = sm.OLS(y, X_tilde)
# Train the object
regression_model = regression_model.fit()
# Get the paramters
beta_0, beta_1, beta_2 = regression_model.params
print('beta_0 is: ', beta_0)
print('beta_1 is: ', beta_1)
print('beta_2 is: ', beta_2)
# Predict new values
X_new = np.array([[65, 62],[70, 65],[75, 70]])	# Feature values of new instances
X_new_tilde = sm.add_constant(X_new)	# Preprocess X_new
y_pred = regression_model.predict(X_new_tilde)	# Predict the target
print('Predictions:', y_pred)

Note

Now that our training set has 2 features, we need to provide 2 features for each new instance we want to predict. That's why np.array([[65, 62],[70, 65],[75, 70]]) was used in the example above. It predicts y for 3 new instances: [Father:65,Mother:62], [Father:70, Mother:65], [Father:75, Mother:70]

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Linear Regression with Python

1. Simple Linear Regression

What is Linear Regression Finding the Parameters Building Linear Regression Using NumPy Building Linear Regression Using Statsmodels Predict House Prices

2. Multiple Linear Regression

Linear Regression with Two Features Linear Regression with n Features Building Multiple Linear Regression Choosing the Features Predict Prices Using Two Features

3. Polynomial Regression

Quadratic Regression Polynomial Regression Building Polynomial Regression Interpolation vs Extrapolation Evaluate the Model

4. Choosing The Best Model

Metrics Overfitting R-squared Predict Prices Using Polynomial Regression