Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What is Pipeline
course content

Course Content

ML Introduction with scikit-learn

What is PipelineWhat is Pipeline

In the previous section, we completed three preprocessing steps, Imputing, Encoding and Scaling.

We did it step by step, transforming the needed columns and collecting them back to the X array.
It is a tedious process, especially when there is an OneHotEncoder that changes the number of columns.
Another problem with it is that to make a prediction, new instances should go through the same preprocessing steps, so we would need to perform all those transformations again.

Luckily, Scikit-learn provides a Pipeline class – a simple way to collect all those transformations together, so it is easier to transform both training data and new instances.

Pipeline is a container for all the transformers (and the final estimator, as you will see later).
By calling the .fit_transform() method of a Pipeline object, it will sequentially call each transformer's .fit_transform().
This way, you only need to call .fit_transform() once to transform a training set and then the .transform() method to transform new instances.

Everything was clear?

Section 3. Chapter 1
some-alt