Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Building Linear Regression | Section
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Supervised Learning Essentials

bookBuilding Linear Regression

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.

123456
import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
copy

Finding Parameters

To implement linear regression in Scikit-learn, we use the LinearRegression class.

LinearRegression(fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

Parameters

The LinearRegression class has several parameters that control how the model is fitted.

  • fit_intercept (default=True): Decides whether to calculate the intercept (bias) for this model. If set to False, no intercept will be used in calculations (i.e., data is expected to be centered).
  • copy_X (default=True): If True, X will be copied; else, it may be overwritten.
  • n_jobs (default=None): The number of jobs to use for the computation. This will only provide a speedup for n_targets > 1 and sufficient large problems. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
  • positive (default=False): When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.

Common Methods

  • fit(X, y): Fit the linear model.
  • predict(X): Predict using the linear model.
  • score(X, y): Return the coefficient of determination of the prediction.

Attributes

  • coef_: Estimated coefficients for the linear regression problem.
  • intercept_: Independent term in the linear model.

In Scikit-learn, we use the LinearRegression class from the linear_model module. Unlike NumPy, we don't define the degree here; this class is specifically designed for linear models. We use the .fit() method to calculate the parameters.

Here is an example:

1234567891011121314
from sklearn.linear_model import LinearRegression import numpy as np # Convert the pandas Series to a NumPy array, then reshape X_reshaped = X.values.reshape(-1, 1) model = LinearRegression() model.fit(X_reshaped, y) # Train the model beta_0 = model.intercept_ # Get the intercept (beta_0) beta_1 = model.coef_[0] # Get the coefficient (beta_1) print('beta_0 is', beta_0) print('beta_1 is', beta_1)
copy
Note
Note

If you are unfamiliar with the syntax model.intercept_ and model.coef_, this is a Scikit-learn convention. Attributes that are calculated (learned) during the training process always end with an underscore _ (e.g., intercept_, coef_). The intercept_ is a single value, while coef_ is an array containing the coefficients for each feature (in simple linear regression, it has only one item).

Making the Predictions

Now we can plot the line and predict new variables using the trained model.

123
plt.scatter(X, y) # Build a scatter plot plt.plot(X, model.predict(X_reshaped), color='red') # Plot the line using predictions plt.show()
copy

Now that we have the trained model, we can use the .predict() method to predict new values.

123
X_new = np.array([[65], [70], [75]]) # Feature values (must be 2D) y_pred = model.predict(X_new) # Predict the target print('Predicted y: ', y_pred)
copy

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

question mark

Which class is used to implement linear regression in Scikit-learn?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookBuilding Linear Regression

Swipe to show menu

You already know what simple linear regression is and how to find the line that fits the data best. You will now go through all the steps of building a linear regression for a real dataset.

Loading Data

We have a file, simple_height_data.csv, with the data from our examples. We'll load the file and take a look at it:

123456
import pandas as pd file_link = 'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b22d1166-efda-45e8-979e-6c3ecfc566fc/simple_height_data.csv' df = pd.read_csv(file_link) # Read the file print(df.head()) # Print the first 5 instances from a dataset
copy

So the dataset has two columns: the first is 'Father', which is the input feature, and the second is 'Height', which is our target variable.

We'll assign our target values to the y variable and feature values to X and build a scatterplot.

123456
import matplotlib.pyplot as plt X = df['Father'] # Assign the feature y = df['Height'] # Assign the target plt.scatter(X,y) # Build scatterplot plt.show()
copy

Finding Parameters

To implement linear regression in Scikit-learn, we use the LinearRegression class.

LinearRegression(fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

Parameters

The LinearRegression class has several parameters that control how the model is fitted.

  • fit_intercept (default=True): Decides whether to calculate the intercept (bias) for this model. If set to False, no intercept will be used in calculations (i.e., data is expected to be centered).
  • copy_X (default=True): If True, X will be copied; else, it may be overwritten.
  • n_jobs (default=None): The number of jobs to use for the computation. This will only provide a speedup for n_targets > 1 and sufficient large problems. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
  • positive (default=False): When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.

Common Methods

  • fit(X, y): Fit the linear model.
  • predict(X): Predict using the linear model.
  • score(X, y): Return the coefficient of determination of the prediction.

Attributes

  • coef_: Estimated coefficients for the linear regression problem.
  • intercept_: Independent term in the linear model.

In Scikit-learn, we use the LinearRegression class from the linear_model module. Unlike NumPy, we don't define the degree here; this class is specifically designed for linear models. We use the .fit() method to calculate the parameters.

Here is an example:

1234567891011121314
from sklearn.linear_model import LinearRegression import numpy as np # Convert the pandas Series to a NumPy array, then reshape X_reshaped = X.values.reshape(-1, 1) model = LinearRegression() model.fit(X_reshaped, y) # Train the model beta_0 = model.intercept_ # Get the intercept (beta_0) beta_1 = model.coef_[0] # Get the coefficient (beta_1) print('beta_0 is', beta_0) print('beta_1 is', beta_1)
copy
Note
Note

If you are unfamiliar with the syntax model.intercept_ and model.coef_, this is a Scikit-learn convention. Attributes that are calculated (learned) during the training process always end with an underscore _ (e.g., intercept_, coef_). The intercept_ is a single value, while coef_ is an array containing the coefficients for each feature (in simple linear regression, it has only one item).

Making the Predictions

Now we can plot the line and predict new variables using the trained model.

123
plt.scatter(X, y) # Build a scatter plot plt.plot(X, model.predict(X_reshaped), color='red') # Plot the line using predictions plt.show()
copy

Now that we have the trained model, we can use the .predict() method to predict new values.

123
X_new = np.array([[65], [70], [75]]) # Feature values (must be 2D) y_pred = model.predict(X_new) # Predict the target print('Predicted y: ', y_pred)
copy

So it is pretty easy to get the parameters of the linear regression. But some libraries can also give you some extra information.

question mark

Which class is used to implement linear regression in Scikit-learn?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 3
some-alt