Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
StandardScaler, MinMaxScaler, MaxAbsScaler | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
course content

Contenido del Curso

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

bookStandardScaler, MinMaxScaler, MaxAbsScaler

There are three popular approaches to scaling the data:

  • MinMaxScaler - scales features to a [0, 1] range;
  • MaxAbsScaler – scales features such as the maximum absolute value is 1 (so the data is guaranteed to be in a [-1, 1] range);
  • StandardScaler – standardize features making the mean equal to 0 and variance equal to 1.

To demonstrate how the scalers work, we will use the 'culmen_depth_mm' and 'body_mass_g' features of the Penguins dataset. Let's plot them.

MinMaxScaler

The MinMaxScaler works by subtracting the minimum value (to make values start from zero) and then dividing by (x_max - x_min) to make it less or equal to 1.

Here is the video showing how MinMaxScaler works:

MaxAbsScaler

The MaxAbsScaler works by finding the maximum absolute value and dividing each value by it. This ensures that the maximum absolute value is 1.

Here is the video showing how MaxAbsScaler works:

StandardScaler

The idea of StandardScaler comes from statistics. It works by subtracting the mean (to center around zero) and dividing by the standard deviation (to make the variance equal to 1).

Note

If you do not understand what the mean, standard deviation, and variance are, you can check our Learning Statistics with Python course.
However, this knowledge is not mandatory to move on.

Here is a video showing how StandardScaler works.

Let's look at the coding example using MinMaxScaler (other scalers are used exactly the same).

12345678910
import pandas as pd from sklearn.preprocessing import MinMaxScaler df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv') # Assign X,y variables X, y = df.drop('species', axis=1), df['species'] # Initialize a MinMaxScaler object and transform the X minmax = MinMaxScaler() X = minmax.fit_transform(X) print(X)
copy

The output is not the prettiest since scalers transform the data to a NumPy array, but with pipelines, it won't be a problem.

Note

You should only scale the feature columns (the X variable). No need to scale the target. It will only make it harder to get an inverse transform.

Which scaler to use?

A StandardScaler is less sensitive to outliers, so it is a good default scaler.
If you don't like the StandardScaler, between MinMaxScaler and MaxAbsScaler, it comes down to personal preferences, scaling data to the [0,1] range or to [-1,1].

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 10
We're sorry to hear that something went wrong. What happened?
some-alt