Conteúdo do Curso
Linear Regression with Python
Linear Regression with Python
Building Multiple Linear Regression
The OLS
class allows you to build Multiple Linear Regression the same way as Simple Linear Regression. But unfortunately, the np.polyfit()
function does not handle the multiple features case.
We will stick with the OLS
class.
Building X̃ matrix
We have the same dataset from the simple linear regression example, but it now has the mother's height as the second feature. Let's load it and look at its X
variable.
import pandas as pd import statsmodels.api as sm file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv' df = pd.read_csv(file_link) # Open the file # Assign the variables X = df[['Father', 'Mother']] y = df['Height'] print(X.head())
Remember, we should use OLS(y, X_tilde)
to initialize the OLS
object. As you can see, the X variable already holds two features in separate columns. So to get the X_tilde, we only need to add 1s as a first column. The sm.add_constant(X)
function is doing exactly that!
import pandas as pd import statsmodels.api as sm file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv' df = pd.read_csv(file_link) # Open the file # Assign the variables X = df[['Father', 'Mother']] y = df['Height'] # Create X_tilde X_tilde = sm.add_constant(X) print(X_tilde.head())
Finding the parameters
Great! Now we can build the model, find the parameters and make predictions the same way we did in the previous section.
import pandas as pd import statsmodels.api as sm import numpy as np file_link='https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/heights_two_feature.csv' df = pd.read_csv(file_link) # Open the file X,y = df[['Father', 'Mother']], df['Height'] # Assign the variables X_tilde = sm.add_constant(X) # Create X_tilde # Initialize an OLS object regression_model = sm.OLS(y, X_tilde) # Train the object regression_model = regression_model.fit() # Get the paramters beta_0, beta_1, beta_2 = regression_model.params print('beta_0 is: ', beta_0) print('beta_1 is: ', beta_1) print('beta_2 is: ', beta_2) # Predict new values X_new = np.array([[65, 62],[70, 65],[75, 70]]) # Feature values of new instances X_new_tilde = sm.add_constant(X_new) # Preprocess X_new y_pred = regression_model.predict(X_new_tilde) # Predict the target print('Predictions:', y_pred)
Note
Now that our training set has 2 features, we need to provide 2 features for each new instance we want to predict. That's why
np.array([[65, 62],[70, 65],[75, 70]])
was used in the example above. It predictsy
for 3 new instances: [Father:65,Mother:62], [Father:70, Mother:65], [Father:75, Mother:70]
Obrigado pelo seu feedback!