course content

Course Content

ML Introduction with scikit-learn

Final EstimatorFinal Estimator

For now, we only used a Pipeline for the preprocessing.
However, most of the time, that's not the endpoint. Usually, after preprocessing, we want to use this transformed data in a Predictor(Model).
That's why the Pipeline class allows the final step to be any estimator, usually a predictor.
The following illustration shows how the Pipeline works when its last step is a predictor.


When we call the .fit() method of a pipeline, the .fit_transform() is called on every transformer.
But when we call the .predict() method, the .transform() method is called.
The .predict() method is mostly used for predicting new instances, which must be transformed exactly the same way as the training set during .fit().

If we applied the .fit_transform() method instead of .transform() to transform new instances, the OneHotEncoder could create new columns in a different order, and Scalers would most likely scale the data a bit differently. As a result, new instances would be transformed differently from the training set, and prediction would be unreliable.
That is something to be aware of when you are not using pipelines. And it is one more benefit of pipelines that they just handle those steps automatically.

To use a final estimator, you just need to add it as a last step of the pipeline. For example, in the next chapter, we will use a KNeighborsClassifier model as a final estimator. The syntax is following:

Everything was clear?

Section 3. Chapter 5