# ML Introduction with scikit-learn

1. Machine Learning Concepts

## Preprocessing Summary

That's it for the preprocessing. The three problems we addressed were **missing values**, **categorical values**, and **unscaled data**.

These are the most frequent problems datasets face, so Imputation, Encoding, and Scaling are included in almost every pipeline.

Soon you will learn how to make pipelines in `sklearn`

, making it easy to put everything together.

Now let's revise what transformer we learned:

#### Imputers (Dealing with missing values)

Imputer | What for |

`SimpleImputer(strategy='most_frequent')` | Impute categorical data |

`SimpleImputer(strategy='mean'/'median')` | Impute numerical data |

#### Encoders (Dealing with categorical values)

Encoder | What for |

`OrdinalEncoder` | Encode ordinal features |

`OneHotEncoder` | Encode nominal features |

`LabelEncoder` | Encode target |

#### Scalers (Dealing with different scales)

Scaler | What for |

`MinMaxScaler` | Scale the features to a [0,1] range |

`MaxAbsScaler` | Scale the features to a [-1,1] range |

`StandardScaler` | Scale the features so that the mean is 0 and the variance is 1 |

Everything was clear?

