Course Content
ML Introduction with scikit-learn
ML Introduction with scikit-learn
Final Estimator
For now, we only used a Pipeline
for the preprocessing.
However, most of the time, that's not the endpoint. Usually, after preprocessing, we want to use this transformed data in a Predictor(Model).
That's why the Pipeline
class allows the final step to be any estimator, usually a predictor.
The following illustration shows how the Pipeline
works when its last step is a predictor.
Note
When we call the
.fit()
method of a pipeline, the.fit_transform()
is called on every transformer.
But when we call the.predict()
method, the.transform()
method is called.
The.predict()
method is mostly used for predicting new instances, which must be transformed exactly the same way as the training set during.fit()
.
If we applied the .fit_transform()
method instead of .transform()
to transform new instances, the OneHotEncoder
could create new columns in a different order, and Scalers would most likely scale the data a bit differently. As a result, new instances would be transformed differently from the training set, and prediction would be unreliable.
That is something to be aware of when you are not using pipelines. And it is one more benefit of pipelines that they just handle those steps automatically.
To use a final estimator, you just need to add it as a last step of the pipeline. For example, in the next chapter, we will use a KNeighborsClassifier
model as a final estimator. The syntax is following:
Everything was clear?