Linear Regression with Python

Course Content

Linear Regression with Python

Linear Regression with Python

1. Simple Linear Regression

What is Linear Regression Finding the Parameters Building Linear Regression Using NumPy Building Linear Regression Using Statsmodels Challenge: Predicting House Prices

2. Multiple Linear Regression

Linear Regression with Two Features Linear Regression with N Features Building Multiple Linear Regression Choosing the Features Challenge: Predicting Prices Using Two Features

3. Polynomial Regression

Quadratic Regression Polynomial Regression Building Polynomial Regression Interpolation vs Extrapolation Challenge: Evaluating the Model

4. Choosing The Best Model

Metrics Overfitting R-squared Challenge: Predicting Prices Using Polynomial Regression

Finding the Parameters

We now know that Linear Regression is just a line that best fits data. But how can you tell which is the right one?

Well, you can calculate the difference between the predicted value and the actual target value for each data point in the training set.
These differences are called residuals(or errors). And the goal is to make the residuals as small as possible.

Ordinary Least Squares

The default approach is the Ordinary Least Squares(OLS) method:
Take each residual, square it (mainly to eliminate the sign of a residual), and sum all of them.
That is called SSR(Sum of squared residuals). And the task is to find the parameters that minimize the SSR.

Normal Equation

Fortunately, we do not need to try all the lines and calculate SSR for them. The task of minimizing SSR has a mathematical solution that is not very computationally expensive.
This solution is called the Normal Equation.

This equation gives us the parameters of a line with the least SSR.
Did you not understand how it works? No worries! It is pretty complex maths. But you don't have to calculate the parameters with your hands. Many libraries have already implemented linear regression.

Quiz

1. Consider the image above. Which regression line is better?

2. `y_true - y_predicted` is called

Consider the image above. Which regression line is better?

Select the correct answer

Blue (SSR = 43.07)

Green (SSR = 109.3)

Orange (SSR = 60.43)

Red (SSR = 64.32)

y_true - y_predicted is called

Select the correct answer

Residual/Error

SSR

Normal Equation

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Linear Regression with Python

Linear Regression with Python

1. Simple Linear Regression

What is Linear Regression Finding the Parameters Building Linear Regression Using NumPy Building Linear Regression Using Statsmodels Challenge: Predicting House Prices

2. Multiple Linear Regression

Linear Regression with Two Features Linear Regression with N Features Building Multiple Linear Regression Choosing the Features Challenge: Predicting Prices Using Two Features

3. Polynomial Regression

Quadratic Regression Polynomial Regression Building Polynomial Regression Interpolation vs Extrapolation Challenge: Evaluating the Model

4. Choosing The Best Model

Metrics Overfitting R-squared Challenge: Predicting Prices Using Polynomial Regression

Finding the Parameters

We now know that Linear Regression is just a line that best fits data. But how can you tell which is the right one?

Well, you can calculate the difference between the predicted value and the actual target value for each data point in the training set.
These differences are called residuals(or errors). And the goal is to make the residuals as small as possible.

Ordinary Least Squares

The default approach is the Ordinary Least Squares(OLS) method:
Take each residual, square it (mainly to eliminate the sign of a residual), and sum all of them.
That is called SSR(Sum of squared residuals). And the task is to find the parameters that minimize the SSR.

Normal Equation

Fortunately, we do not need to try all the lines and calculate SSR for them. The task of minimizing SSR has a mathematical solution that is not very computationally expensive.
This solution is called the Normal Equation.

This equation gives us the parameters of a line with the least SSR.
Did you not understand how it works? No worries! It is pretty complex maths. But you don't have to calculate the parameters with your hands. Many libraries have already implemented linear regression.

Quiz

1. Consider the image above. Which regression line is better?

2. `y_true - y_predicted` is called

Consider the image above. Which regression line is better?

Select the correct answer

Blue (SSR = 43.07)

Green (SSR = 109.3)

Orange (SSR = 60.43)

Red (SSR = 64.32)

y_true - y_predicted is called

Select the correct answer

Residual/Error

SSR

Normal Equation

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2

some-alt