Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Preprocessing Summary | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
course content

Course Content

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

bookPreprocessing Summary

That's it for the preprocessing. The three problems we addressed were missing values, categorical values, and unscaled data.
These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.
Soon you will learn how to make pipelines in sklearn, making it easy to put everything together.
Now let's revise what transformer we learned:

Imputers (Dealing with missing values)

ImputerWhat for
SimpleImputer(strategy='most_frequent')Impute categorical data
SimpleImputer(strategy='mean'/'median')Impute numerical data

Encoders (Dealing with categorical values)

EncoderWhat for
OrdinalEncoderEncode ordinal features
OneHotEncoderEncode nominal features
LabelEncoderEncode target

Scalers (Dealing with different scales)

ScalerWhat for
MinMaxScalerScale the features to a [0,1] range
MaxAbsScalerScale the features to a [-1,1] range
StandardScalerScale the features so that the mean is 0 and the variance is 1

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 12
We're sorry to hear that something went wrong. What happened?
some-alt