Course Content

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

Scikit-learn Concepts Getting Familiar with Dataset Dealing with Missing Values Challenge: Imputing Missing Values OrdinalEncoder One-Hot Encoder LabelEncoder Challenge: Encoding Categorical Variables Why Scale the Data?StandardScaler, MinMaxScaler, MaxAbsScaler Challenge: Scaling the Features

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

Models KNeighborsClassifier Evaluating the Model Cross-Validation Challenge: Evaluating the Model with Cross-Validation GridSearchCV The Flaw of GridSearchCV Challenge: Tuning Hyperparameters with RandomizedSearchCV Modeling Summary Challenge: Putting It All Together

StandardScaler, MinMaxScaler, MaxAbsScaler

There are three popular approaches to scaling the data:

MinMaxScaler: scales features to a [0, 1] range;
MaxAbsScaler: scales features such as the maximum absolute value is 1 (so the data is guaranteed to be in a [-1, 1] range);
StandardScaler: standardize features making the mean equal to 0 and variance equal to 1.

To demonstrate how the scalers work, we will use the 'culmen_depth_mm' and 'body_mass_g' features of the penguins dataset. Let's plot them.

MinMaxScaler

The MinMaxScaler works by subtracting the minimum value (to make values start from zero) and then dividing by (x_max - x_min) to make it less or equal to 1.

Here is the gif showing how MinMaxScaler works:

MaxAbsScaler

The MaxAbsScaler works by finding the maximum absolute value and dividing each value by it. This ensures that the maximum absolute value is 1.

StandardScaler

The idea of StandardScaler comes from statistics. It works by subtracting the mean (to center around zero) and dividing by the standard deviation (to make the variance equal to 1).

Let's look at a coding example using MinMaxScaler. Other scalers are used in the same way.


              12345678910
            
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv')
# Assign X,y variables
X, y = df.drop('species', axis=1), df['species']
# Initialize a MinMaxScaler object and transform the X
minmax = MinMaxScaler()
X = minmax.fit_transform(X)
print(X)

The output is not the prettiest since scalers transform the data to a NumPy array, but with pipelines, it won't be a problem.

Which Scaler to Use?

A StandardScaler is more sensitive to outliers, making it less suitable as a default scaler. If you prefer an alternative to StandardScaler, the choice between MinMaxScaler and MaxAbsScaler depends on personal preference, whether scaling data to the [0,1] range with MinMaxScaler or to [-1,1] with MaxAbsScaler.

1. What is the primary purpose of using `MinMaxScaler` in data preprocessing?

2. Why might you reconsider using `StandardScaler` for your dataset?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 10

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling

StandardScaler, MinMaxScaler, MaxAbsScaler

There are three popular approaches to scaling the data:

MinMaxScaler: scales features to a [0, 1] range;
MaxAbsScaler: scales features such as the maximum absolute value is 1 (so the data is guaranteed to be in a [-1, 1] range);
StandardScaler: standardize features making the mean equal to 0 and variance equal to 1.

To demonstrate how the scalers work, we will use the 'culmen_depth_mm' and 'body_mass_g' features of the penguins dataset. Let's plot them.

MinMaxScaler

The MinMaxScaler works by subtracting the minimum value (to make values start from zero) and then dividing by (x_max - x_min) to make it less or equal to 1.

Here is the gif showing how MinMaxScaler works:

MaxAbsScaler

The MaxAbsScaler works by finding the maximum absolute value and dividing each value by it. This ensures that the maximum absolute value is 1.

StandardScaler

The idea of StandardScaler comes from statistics. It works by subtracting the mean (to center around zero) and dividing by the standard deviation (to make the variance equal to 1).

Let's look at a coding example using MinMaxScaler. Other scalers are used in the same way.


              12345678910
            
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv')
# Assign X,y variables
X, y = df.drop('species', axis=1), df['species']
# Initialize a MinMaxScaler object and transform the X
minmax = MinMaxScaler()
X = minmax.fit_transform(X)
print(X)

The output is not the prettiest since scalers transform the data to a NumPy array, but with pipelines, it won't be a problem.

Which Scaler to Use?

1. What is the primary purpose of using `MinMaxScaler` in data preprocessing?

2. Why might you reconsider using `StandardScaler` for your dataset?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 10

ML Introduction with scikit-learn

StandardScaler, MinMaxScaler, MaxAbsScaler

MinMaxScaler

MaxAbsScaler

StandardScaler

Which Scaler to Use?

1. What is the primary purpose of using MinMaxScaler in data preprocessing?

2. Why might you reconsider using StandardScaler for your dataset?

ML Introduction with scikit-learn

StandardScaler, MinMaxScaler, MaxAbsScaler

MinMaxScaler

MaxAbsScaler

StandardScaler

Which Scaler to Use?

1. What is the primary purpose of using MinMaxScaler in data preprocessing?

2. Why might you reconsider using StandardScaler for your dataset?

1. What is the primary purpose of using `MinMaxScaler` in data preprocessing?

2. Why might you reconsider using `StandardScaler` for your dataset?

1. What is the primary purpose of using `MinMaxScaler` in data preprocessing?

2. Why might you reconsider using `StandardScaler` for your dataset?