Building Linear Regression Using NumPy
You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.
Loading Data
We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.
We'll assign our target values to the y variable and feature values to X and build a scatterplot.
123456import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
Finding Parameters
Now, NumPy has a nice function to find the parameters of linear regression.
Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:
12345import numpy as np beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters print('beta_0 is', beta_0) print('beta_1 is', beta_1)
If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing
a, b = my_iterator
is the same as
a = my_iterator[0]
b = my_iterator[1]
And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.
Making the Predictions
Now we can plot the line and predict new variables using the parameters.
123plt.scatter(X,y) # Build a scatter plot plt.plot(X, beta_0 + beta_1 * X, color='red') # Plot the line plt.show()
Now that we have the parameters, we can use the linear regression equation to predict new values.
123X_new = np.array([65, 70, 75]) # Feature values of new instances y_pred = beta_0 + beta_1 * X_new # Predict the target print('Predicted y: ', y_pred)
So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Genial!
Completion tasa mejorada a 6.67
Building Linear Regression Using NumPy
Desliza para mostrar el menú
You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.
Loading Data
We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.
We'll assign our target values to the y variable and feature values to X and build a scatterplot.
123456import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
Finding Parameters
Now, NumPy has a nice function to find the parameters of linear regression.
Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:
12345import numpy as np beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters print('beta_0 is', beta_0) print('beta_1 is', beta_1)
If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing
a, b = my_iterator
is the same as
a = my_iterator[0]
b = my_iterator[1]
And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.
Making the Predictions
Now we can plot the line and predict new variables using the parameters.
123plt.scatter(X,y) # Build a scatter plot plt.plot(X, beta_0 + beta_1 * X, color='red') # Plot the line plt.show()
Now that we have the parameters, we can use the linear regression equation to predict new values.
123X_new = np.array([65, 70, 75]) # Feature values of new instances y_pred = beta_0 + beta_1 * X_new # Predict the target print('Predicted y: ', y_pred)
So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.
¡Gracias por tus comentarios!