Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Predict Prices Using Two Features | Multiple Linear Regression
Linear Regression with Python

book
Predict Prices Using Two Features

For this challenge, the same housing dataset will be used. However, now it has two features: age and area of the house (columns age and square_feet).

import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houseprices.csv')
print(df.head())
1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houseprices.csv') print(df.head())
copy

Your task is to build a Multiple Linear Regression model using the OLS class. Also, you will print the summary table to look at the p-values of each feature.

Oppgave

Swipe to start coding

  1. Assign the 'age' and 'square_feet' columns of df to X.
  2. Preprocess the X for the OLS's class constructor.
  3. Build and train the model using the OLS class.
  4. Preprocess the X_new array the same as X.
  5. Predict the target for X_new.
  6. Print the model's summary table.

Løsning

import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houseprices.csv')
# Assign the variables
X = df[['age', 'square_feet']]
y = df['price']
# Preprocess X
X_tilde = sm.add_constant(X)
# Build and train the model
model = sm.OLS(y, X_tilde).fit()
# Create and preprocess X_new
X_new = np.array([[4, 10000], [30, 14000], [70, 16000]])
X_new_tilde = sm.add_constant(X_new)
# Predict instances from X_new and print them
y_pred = model.predict(X_new_tilde)
print('Prediction:', np.floor(y_pred)) # np.floor() keeps only the whole part of the numbers
# Print the summary table
print(model.summary())

If you did everything right, you got the p-values close to zero. That means all our features are significant for the model.

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 5
import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/houseprices.csv')
# Assign the variables
X = df[['___', 'square_feet']]
y = df['price']
# Preprocess X
X_tilde = sm.___(X)
# Build and train the model
model = sm.OLS(___, ___).___()
# Create and preprocess X_new
X_new = np.array([[4, 10000], [30, 14000], [70, 16000]])
X_new_tilde = ___.___(X_new)
# Predict instances from X_new and print them
y_pred = ___.predict(___)
print('Prediction:', np.floor(y_pred)) # np.floor() keeps only the whole part of the numbers
# Print the summary table
print(model.___())
toggle bottom row
some-alt