Course Content
ML Introduction with scikit-learn
ML Introduction with scikit-learn
Preprocessing Summary
That's it for the preprocessing. The three problems we addressed were missing values, categorical values, and unscaled data.
These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.
Soon you will learn how to make pipelines in sklearn
, making it easy to put everything together.
Now let's revise what transformer we learned:
Imputers (Dealing with missing values)
Imputer | What for |
---|---|
SimpleImputer(strategy='most_frequent') | Impute categorical data |
SimpleImputer(strategy='mean'/'median') | Impute numerical data |
Encoders (Dealing with categorical values)
Encoder | What for |
---|---|
OrdinalEncoder | Encode ordinal features |
OneHotEncoder | Encode nominal features |
LabelEncoder | Encode target |
Scalers (Dealing with different scales)
Scaler | What for |
---|---|
MinMaxScaler | Scale the features to a [0,1] range |
MaxAbsScaler | Scale the features to a [-1,1] range |
StandardScaler | Scale the features so that the mean is 0 and the variance is 1 |
Thanks for your feedback!