Building Linear Regression
You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.
Loading Data
We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.
We'll assign our target values to the y variable and feature values to X and build a scatterplot.
123456import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
Finding Parameters
To implement linear regression in Scikit-learn, we use the LinearRegression class.
LinearRegression(fit_intercept=True, copy_X=True, n_jobs=None, positive=False)
Parameters
The LinearRegression class has several parameters that control how the model is fitted.
fit_intercept(default=True): Decides whether to calculate the intercept (bias) for this model. If set toFalse, no intercept will be used in calculations (i.e., data is expected to be centered).copy_X(default=True): IfTrue, X will be copied; else, it may be overwritten.n_jobs(default=None): The number of jobs to use for the computation. This will only provide a speedup for n_targets > 1 and sufficient large problems.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.positive(default=False): When set toTrue, forces the coefficients to be positive. This option is only supported for dense arrays.
Common Methods
fit(X, y): Fit the linear model.predict(X): Predict using the linear model.score(X, y): Return the coefficient of determination of the prediction.
Attributes
coef_: Estimated coefficients for the linear regression problem.intercept_: Independent term in the linear model.
In Scikit-learn, we use the LinearRegression class from the linear_model module. Unlike NumPy, we don't define the degree here; this class is specifically designed for linear models. We use the .fit() method to calculate the parameters.
Here is an example:
1234567891011121314from sklearn.linear_model import LinearRegression import numpy as np # Convert the pandas Series to a NumPy array, then reshape X_reshaped = X.values.reshape(-1, 1) model = LinearRegression() model.fit(X_reshaped, y) # Train the model beta_0 = model.intercept_ # Get the intercept (beta_0) beta_1 = model.coef_[0] # Get the coefficient (beta_1) print('beta_0 is', beta_0) print('beta_1 is', beta_1)
If you are unfamiliar with the syntax model.intercept_ and model.coef_, this is a Scikit-learn convention. Attributes that are calculated (learned) during the training process always end with an underscore _ (e.g., intercept_, coef_).
The intercept_ is a single value, while coef_ is an array containing the coefficients for each feature (in simple linear regression, it has only one item).
Making the Predictions
Now we can plot the line and predict new variables using the trained model.
123plt.scatter(X, y) # Build a scatter plot plt.plot(X, model.predict(X_reshaped), color='red') # Plot the line using predictions plt.show()
Now that we have the trained model, we can use the .predict() method to predict new values.
123X_new = np.array([[65], [70], [75]]) # Feature values (must be 2D) y_pred = model.predict(X_new) # Predict the target print('Predicted y: ', y_pred)
So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.33
Building Linear Regression
Swipe to show menu
You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.
Loading Data
We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:
123456import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.
We'll assign our target values to the y variable and feature values to X and build a scatterplot.
123456import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
Finding Parameters
To implement linear regression in Scikit-learn, we use the LinearRegression class.
LinearRegression(fit_intercept=True, copy_X=True, n_jobs=None, positive=False)
Parameters
The LinearRegression class has several parameters that control how the model is fitted.
fit_intercept(default=True): Decides whether to calculate the intercept (bias) for this model. If set toFalse, no intercept will be used in calculations (i.e., data is expected to be centered).copy_X(default=True): IfTrue, X will be copied; else, it may be overwritten.n_jobs(default=None): The number of jobs to use for the computation. This will only provide a speedup for n_targets > 1 and sufficient large problems.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.positive(default=False): When set toTrue, forces the coefficients to be positive. This option is only supported for dense arrays.
Common Methods
fit(X, y): Fit the linear model.predict(X): Predict using the linear model.score(X, y): Return the coefficient of determination of the prediction.
Attributes
coef_: Estimated coefficients for the linear regression problem.intercept_: Independent term in the linear model.
In Scikit-learn, we use the LinearRegression class from the linear_model module. Unlike NumPy, we don't define the degree here; this class is specifically designed for linear models. We use the .fit() method to calculate the parameters.
Here is an example:
1234567891011121314from sklearn.linear_model import LinearRegression import numpy as np # Convert the pandas Series to a NumPy array, then reshape X_reshaped = X.values.reshape(-1, 1) model = LinearRegression() model.fit(X_reshaped, y) # Train the model beta_0 = model.intercept_ # Get the intercept (beta_0) beta_1 = model.coef_[0] # Get the coefficient (beta_1) print('beta_0 is', beta_0) print('beta_1 is', beta_1)
If you are unfamiliar with the syntax model.intercept_ and model.coef_, this is a Scikit-learn convention. Attributes that are calculated (learned) during the training process always end with an underscore _ (e.g., intercept_, coef_).
The intercept_ is a single value, while coef_ is an array containing the coefficients for each feature (in simple linear regression, it has only one item).
Making the Predictions
Now we can plot the line and predict new variables using the trained model.
123plt.scatter(X, y) # Build a scatter plot plt.plot(X, model.predict(X_reshaped), color='red') # Plot the line using predictions plt.show()
Now that we have the trained model, we can use the .predict() method to predict new values.
123X_new = np.array([[65], [70], [75]]) # Feature values (must be 2D) y_pred = model.predict(X_new) # Predict the target print('Predicted y: ', y_pred)
So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.
Thanks for your feedback!