What is Pipeline
In the previous section, we completed three preprocessing steps: imputing, encoding, and scaling.
We did it step by step, transforming the needed columns and collecting them back to the X
array. It is a tedious process, especially when there is an OneHotEncoder
that changes the number of columns.
Another problem with it is that to make a prediction, new instances should go through the same preprocessing steps, so we would need to perform all those transformations again.
Luckily, Scikit-learn provides a Pipeline
class – a simple way to collect all those transformations together, so it is easier to transform both training data and new instances.
A Pipeline
serves as a container for a sequence of transformers, and eventually, an estimator. When you invoke the .fit_transform()
method on a Pipeline
, it sequentially applies the .fit_transform()
method of each transformer to the data.
python
This streamlined approach means you only need to call .fit_transform()
once on the training set and subsequently use the .transform()
method to process new instances.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат