Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Multiple Linear Regression | Multiple Linear Regression
Quizzes & Challenges
Quizzes
Challenges
/
Linear Regression with Python

bookBuilding Multiple Linear Regression

The OLS class allows you to build Multiple Linear Regression the same way as Simple Linear Regression. But unfortunately, the np.polyfit() function does not handle the multiple features case.

We will stick with the OLS class.

Building X̃ Matrix

We have the same dataset from the simple linear regression example, but it now has the mother's height as the second feature. We'll load it and look at its X variable:

123456789
import pandas as pd import statsmodels.api as sm file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv' df = pd.read_csv(file_link) # Open the file # Assign the variables X = df[['Father', 'Mother']] y = df['Height'] print(X.head())
copy

Remember, we should use OLS(y, X_tilde) to initialize the OLS object. As you can see, the X variable already holds two features in separate columns. So to get the X_tilde, we only need to add 1s as a first column. The sm.add_constant(X) function is doing exactly that!

123
# Create X_tilde X_tilde = sm.add_constant(X) print(X_tilde.head())
copy

Finding the Parameters

Great! Now we can build the model, find the parameters and make predictions the same way we did in the previous section.

12345678910111213141516
import numpy as np # Initialize an OLS object regression_model = sm.OLS(y, X_tilde) # Train the object regression_model = regression_model.fit() # Get the paramters beta_0, beta_1, beta_2 = regression_model.params print('beta_0 is: ', beta_0) print('beta_1 is: ', beta_1) print('beta_2 is: ', beta_2) # Predict new values X_new = np.array([[65, 62],[70, 65],[75, 70]]) # Feature values of new instances X_new_tilde = sm.add_constant(X_new) # Preprocess X_new y_pred = regression_model.predict(X_new_tilde) # Predict the target print('Predictions:', y_pred)
copy
Note
Note

Now that our training set has 2 features, we need to provide 2 features for each new instance we want to predict. That's why np.array([[65, 62],[70, 65],[75, 70]]) was used in the example above. It predicts y for 3 new instances: [Father:65,Mother:62], [Father:70, Mother:65], [Father:75, Mother:70].

question mark

What does the sm.add_constant(X) do?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain what the X_tilde matrix represents in multiple linear regression?

How does the OLS class handle multiple features differently from np.polyfit()?

What do the beta_0, beta_1, and beta_2 parameters mean in this context?

bookBuilding Multiple Linear Regression

Swipe to show menu

The OLS class allows you to build Multiple Linear Regression the same way as Simple Linear Regression. But unfortunately, the np.polyfit() function does not handle the multiple features case.

We will stick with the OLS class.

Building X̃ Matrix

We have the same dataset from the simple linear regression example, but it now has the mother's height as the second feature. We'll load it and look at its X variable:

123456789
import pandas as pd import statsmodels.api as sm file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv' df = pd.read_csv(file_link) # Open the file # Assign the variables X = df[['Father', 'Mother']] y = df['Height'] print(X.head())
copy

Remember, we should use OLS(y, X_tilde) to initialize the OLS object. As you can see, the X variable already holds two features in separate columns. So to get the X_tilde, we only need to add 1s as a first column. The sm.add_constant(X) function is doing exactly that!

123
# Create X_tilde X_tilde = sm.add_constant(X) print(X_tilde.head())
copy

Finding the Parameters

Great! Now we can build the model, find the parameters and make predictions the same way we did in the previous section.

12345678910111213141516
import numpy as np # Initialize an OLS object regression_model = sm.OLS(y, X_tilde) # Train the object regression_model = regression_model.fit() # Get the paramters beta_0, beta_1, beta_2 = regression_model.params print('beta_0 is: ', beta_0) print('beta_1 is: ', beta_1) print('beta_2 is: ', beta_2) # Predict new values X_new = np.array([[65, 62],[70, 65],[75, 70]]) # Feature values of new instances X_new_tilde = sm.add_constant(X_new) # Preprocess X_new y_pred = regression_model.predict(X_new_tilde) # Predict the target print('Predictions:', y_pred)
copy
Note
Note

Now that our training set has 2 features, we need to provide 2 features for each new instance we want to predict. That's why np.array([[65, 62],[70, 65],[75, 70]]) was used in the example above. It predicts y for 3 new instances: [Father:65,Mother:62], [Father:70, Mother:65], [Father:75, Mother:70].

question mark

What does the sm.add_constant(X) do?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 3
some-alt