Building Linear Regression Using NumPy
Deslize para mostrar o menu
You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.
Loading Data
We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.
We'll assign our target values to the y variable and feature values to X and build a scatterplot.
123456import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
Finding Parameters
Now, NumPy has a nice function to find the parameters of linear regression.
Linear Regression is a Polynomial Regression of degree 1 (we will talk about Polynomial Regression in later sections). That's why we need to put deg=1 to get the parameters for the linear regression.
Here is an example:
12345import numpy as np beta_1, beta_0 = np.polyfit(X, y, 1) # Get the parameters print('beta_0 is', beta_0) print('beta_1 is', beta_1)
If you are unfamiliar with the syntax beta_1, beta_0 = np.polyfit(X,y,1), that is called unpacking. If you have an iterator (e.g., list or NumPy array or pandas series) that has two items writing
a, b = my_iterator
is the same as
a = my_iterator[0]
b = my_iterator[1]
And since the return of a polyfit() function is a NumPy array with two values, we are allowed to do that.
Making the Predictions
Now we can plot the line and predict new variables using the parameters.
123plt.scatter(X,y) # Build a scatter plot plt.plot(X, beta_0 + beta_1 * X, color='red') # Plot the line plt.show()
Now that we have the parameters, we can use the linear regression equation to predict new values.
123X_new = np.array([65, 70, 75]) # Feature values of new instances y_pred = beta_0 + beta_1 * X_new # Predict the target print('Predicted y: ', y_pred)
So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo