Learn Building Linear Regression

Swipe to show menu

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:


              123456
            
import pandas as pd

file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv'
df = pd.read_csv(file_link)	# Read the file

print(df.head())	# Print the first 5 instances from a dataset

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.


              123456
            
import matplotlib.pyplot as plt

X = df['Father']	# Assign the feature
y = df['Height']	# Assign the target
plt.scatter(X,y)	# Build scatterplot
plt.show()

Finding Parameters

To implement linear regression in Scikit-learn, we use the LinearRegression class.

LinearRegression(fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

Parameters

The LinearRegression class has several parameters that control how the model is fitted.

fit_intercept (default=True): Decides whether to calculate the intercept (bias) for this model. If set to False, no intercept will be used in calculations (i.e., data is expected to be centered).
copy_X (default=True): If True, X will be copied; else, it may be overwritten.
n_jobs (default=None): The number of jobs to use for the computation. This will only provide a speedup for n_targets > 1 and sufficient large problems. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
positive (default=False): When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.

Common Methods

fit(X, y): Fit the linear model.
predict(X): Predict using the linear model.
score(X, y): Return the coefficient of determination of the prediction.

Attributes

coef_: Estimated coefficients for the linear regression problem.
intercept_: Independent term in the linear model.

In Scikit-learn, we use the LinearRegression class from the linear_model module. Unlike NumPy, we don't define the degree here; this class is specifically designed for linear models. We use the .fit() method to calculate the parameters.

Here is an example:


              1234567891011121314
            
from sklearn.linear_model import LinearRegression
import numpy as np

# Convert the pandas Series to a NumPy array, then reshape
X_reshaped = X.values.reshape(-1, 1)

model = LinearRegression()
model.fit(X_reshaped, y) # Train the model

beta_0 = model.intercept_ # Get the intercept (beta_0)
beta_1 = model.coef_[0]   # Get the coefficient (beta_1)

print('beta_0 is', beta_0)
print('beta_1 is', beta_1)

Note

If you are unfamiliar with the syntax model.intercept_ and model.coef_, this is a Scikit-learn convention. Attributes that are calculated (learned) during the training process always end with an underscore _ (e.g., intercept_, coef_). The intercept_ is a single value, while coef_ is an array containing the coefficients for each feature (in simple linear regression, it has only one item).

Making the Predictions

Now we can plot the line and predict new variables using the trained model.


              123
            
plt.scatter(X, y)    # Build a scatter plot
plt.plot(X, model.predict(X_reshaped), color='red')    # Plot the line using predictions
plt.show()

Now that we have the trained model, we can use the .predict() method to predict new values.


              123
            
X_new = np.array([[65], [70], [75]])    # Feature values (must be 2D)
y_pred = model.predict(X_new)           # Predict the target
print('Predicted y: ', y_pred)

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 3