Building Linear Regression Using Statsmodels
Building a Linear Regression Model
In statsmodels, the OLS class can be used to create a linear regression model.
We first need to initialize an OLS class object using
sm.OLS(y, X_tilde).
Then train it using the fit() method.
model = sm.OLS(y, X_tilde)
model = model.fit()
Which is equivalent to:
model = sm.OLS(y, X_tilde).fit()
The constructor of the OLS class expects a specific array X_tilde as an input, which we saw in the Normal Equation. So you need to convert your X array to X_tilde. This is achievable using the sm.add_constant() function.
Finding Parameters
When the model is trained, you can easily access the parameters using the params attribute.
123456789import statsmodels.api as sm import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv') X, y = df['Father'], df['Height'] X_tilde = sm.add_constant(X) model = sm.OLS(y, X_tilde).fit() beta_0, beta_1 = model.params print(beta_0, beta_1)
Making the Predictions
New instances can easily be predicted using predict() method, but you need to preprocess the input for them too:
12345import numpy as np X_new = np.array([65, 70, 75]) X_new_tilde = sm.add_constant(X_new) print(model.predict(X_new_tilde))
Getting the Summary
As you probably noticed, using the OLS class is not as easy as the polyfit() function. But using OLS has its benefits. While training, it calculates a lot of statistical information. You can access the information using the summary() method.
1print(model.summary())
That's a lot of statistics. We will discuss the table's most important parts in later sections.
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Fantastiskt!
Completion betyg förbättrat till 6.67
Building Linear Regression Using Statsmodels
Svep för att visa menyn
Building a Linear Regression Model
In statsmodels, the OLS class can be used to create a linear regression model.
We first need to initialize an OLS class object using
sm.OLS(y, X_tilde).
Then train it using the fit() method.
model = sm.OLS(y, X_tilde)
model = model.fit()
Which is equivalent to:
model = sm.OLS(y, X_tilde).fit()
The constructor of the OLS class expects a specific array X_tilde as an input, which we saw in the Normal Equation. So you need to convert your X array to X_tilde. This is achievable using the sm.add_constant() function.
Finding Parameters
When the model is trained, you can easily access the parameters using the params attribute.
123456789import statsmodels.api as sm import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv') X, y = df['Father'], df['Height'] X_tilde = sm.add_constant(X) model = sm.OLS(y, X_tilde).fit() beta_0, beta_1 = model.params print(beta_0, beta_1)
Making the Predictions
New instances can easily be predicted using predict() method, but you need to preprocess the input for them too:
12345import numpy as np X_new = np.array([65, 70, 75]) X_new_tilde = sm.add_constant(X_new) print(model.predict(X_new_tilde))
Getting the Summary
As you probably noticed, using the OLS class is not as easy as the polyfit() function. But using OLS has its benefits. While training, it calculates a lot of statistical information. You can access the information using the summary() method.
1print(model.summary())
That's a lot of statistics. We will discuss the table's most important parts in later sections.
Tack för dina kommentarer!