Summary  
This chapter explains how to add a final estimator as the last step in a pipeline to chain preprocessing with prediction and why you should use transform instead of fit_transform on new data to maintain consistent feature encoding.

General domain of usage  
Machine learning

`Pipeline` was previously used for **preprocessing**, but its real purpose is to chain preprocessing with a **final predictor**. The last step in a pipeline can be any estimator (typically a model) that produces predictions.

When calling `.fit()`, each transformer runs `.fit_transform()`.
When calling `.predict()`, the pipeline uses `.transform()` before sending data to the final estimator.
This is required because **new data must be transformed exactly like the training data**.

Note

## Why `.transform()`?

Using `.fit_transform()` on new data could change encodings (e.g., in `OneHotEncoder`), creating mismatched columns and unreliable predictions.
`.transform()` guarantees **consistent preprocessing**, ignoring unseen categories and keeping the same column order.

Here is how one-hot encoded training data looks like:

If `.fit_transform()` were applied to **new instances**, the `OneHotEncoder` could generate columns in a different order or even introduce new ones. This would cause the new data to be transformed **inconsistently with the training set**, making predictions **unreliable**.


However, using `.transform()` ensures that the new data is encoded **exactly as the training data**, ignoring categories not seen during training:

## Adding the Final Estimator

Simply add the model as the **last step** of the pipeline:

```python
pipe = make_pipeline(
    ct,
    SimpleImputer(strategy='most_frequent'),
    StandardScaler(),
    KNeighborsClassifier()
)
pipe.fit(X, y)
pipe.predict(X_new)
```

This allows the whole workflow—preprocessing + prediction—to run with one call.

Machine learning drives modern technological innovation across all industries. Embark on a comprehensive introduction to predictive modeling by mastering foundational algorithmic concepts utilizing Scikit-Learn. Participants will construct robust classifiers, evaluate performance metrics, ultimately culminating in the development of a complete predictive project.

Final Estimator

Why .transform()?

Adding the Final Estimator

Why `.transform()`?