course content

Course Content

ML Introduction with scikit-learn

Preprocessing SummaryPreprocessing Summary

That's it for the preprocessing. The three problems we addressed were missing values, categorical values, and unscaled data.
These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.
Soon you will learn how to make pipelines in sklearn, making it easy to put everything together.
Now let's revise what transformer we learned:

Imputers (Dealing with missing values)

ImputerWhat for
SimpleImputer(strategy='most_frequent')Impute categorical data
SimpleImputer(strategy='mean'/'median')Impute numerical data

Encoders (Dealing with categorical values)

EncoderWhat for
OrdinalEncoderEncode ordinal features
OneHotEncoderEncode nominal features
LabelEncoderEncode target

Scalers (Dealing with different scales)

ScalerWhat for
MinMaxScalerScale the features to a [0,1] range
MaxAbsScalerScale the features to a [-1,1] range
StandardScalerScale the features so that the mean is 0 and the variance is 1

Everything was clear?

Section 2. Chapter 12