Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Linear Regression Using NumPy | Simple Linear Regression
Linear Regression with Python

bookBuilding Linear Regression Using NumPy

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.

12345678910
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
copy

Finding Parameters

Now, NumPy has a nice function to find the parameters of linear regression.

Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:

12345678910
import pandas as pd import numpy as np file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the files X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters print('beta_0 is', beta_0) print('beta_1 is', beta_1)
copy
Note
Note

If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing

a, b = my_iterator

is the same as

a = my_iterator[0]
b = my_iterator[1]

And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.

Making the Predictions

Now we can plot the line and predict new variables using the parameters.

123456789101112
import pandas as pd import numpy as np import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters plt.scatter(X,y) # Build a scatter plot plt.plot(X, beta_0 + beta_1 * X, color='red') # Plot the line plt.show()
copy

Now that we have the parameters, we can use the linear regression equation to predict new values.

1234567891011
import pandas as pd import numpy as np import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters X_new = np.array([65, 70, 75]) # Feature values of new instances y_pred = beta_0 + beta_1 * X_new # Predict the target print('Predicted y: ', y_pred)
copy

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

question mark

You can find the parameters of Simple Linear Regression using the NumPy's function:

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

What extra information can other libraries provide for linear regression?

Can you explain how to interpret the parameters beta_0 and beta_1?

How can I evaluate the performance of this linear regression model?

Awesome!

Completion rate improved to 5.26

bookBuilding Linear Regression Using NumPy

Swipe to show menu

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.

12345678910
import pandas as pd import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
copy

Finding Parameters

Now, NumPy has a nice function to find the parameters of linear regression.

Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:

12345678910
import pandas as pd import numpy as np file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the files X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters print('beta_0 is', beta_0) print('beta_1 is', beta_1)
copy
Note
Note

If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing

a, b = my_iterator

is the same as

a = my_iterator[0]
b = my_iterator[1]

And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.

Making the Predictions

Now we can plot the line and predict new variables using the parameters.

123456789101112
import pandas as pd import numpy as np import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters plt.scatter(X,y) # Build a scatter plot plt.plot(X, beta_0 + beta_1 * X, color='red') # Plot the line plt.show()
copy

Now that we have the parameters, we can use the linear regression equation to predict new values.

1234567891011
import pandas as pd import numpy as np import matplotlib.pyplot as plt file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file X, y = df['Father'], df['Height'] # Assign the variables beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters X_new = np.array([65, 70, 75]) # Feature values of new instances y_pred = beta_0 + beta_1 * X_new # Predict the target print('Predicted y: ', y_pred)
copy

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

question mark

You can find the parameters of Simple Linear Regression using the NumPy's function:

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt