Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Building Pipelines | Pipelines and Composition Patterns
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Mastering scikit-learn API and Workflows

bookBuilding Pipelines

To streamline machine learning workflows, scikit-learn provides the Pipeline object. A Pipeline chains together a sequence of transformers and a final estimator, allowing you to treat the entire sequence as a single estimator. This means you can combine preprocessing steps (such as scaling or encoding) with your model, making your code more organized, less error-prone, and easier to maintain. By encapsulating multiple steps, a pipeline ensures that transformations are applied consistently during both training and prediction, reducing the risk of data leakage and simplifying cross-validation or grid search procedures.

123456789101112131415161718192021
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris # Load example data X, y = load_iris(return_X_y=True) # Construct a pipeline with scaling and logistic regression pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression()) ]) # Fit the pipeline to the data pipeline.fit(X, y) # Predict using the pipeline predictions = pipeline.predict(X) print(predictions[:5])
copy

When you use a pipeline, each step is executed in the order you defined. In the example above, the data first passes through the StandardScaler, which standardizes features by removing the mean and scaling to unit variance. The output of the scaler is then passed directly to the LogisticRegression classifier. By calling fit on the pipeline, both the scaler and the classifier are trained sequentially: fit_transform is called on the scaler, and then fit is called on the classifier using the transformed data. Similarly, when you call predict, the input is automatically transformed by the scaler before being passed to the classifier for prediction. This ordered execution ensures that your preprocessing and modeling steps are always applied in a consistent and reproducible way.

question mark

Which statement best describes the purpose of a scikit-learn Pipeline?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 1

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

bookBuilding Pipelines

Desliza para mostrar el menú

To streamline machine learning workflows, scikit-learn provides the Pipeline object. A Pipeline chains together a sequence of transformers and a final estimator, allowing you to treat the entire sequence as a single estimator. This means you can combine preprocessing steps (such as scaling or encoding) with your model, making your code more organized, less error-prone, and easier to maintain. By encapsulating multiple steps, a pipeline ensures that transformations are applied consistently during both training and prediction, reducing the risk of data leakage and simplifying cross-validation or grid search procedures.

123456789101112131415161718192021
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris # Load example data X, y = load_iris(return_X_y=True) # Construct a pipeline with scaling and logistic regression pipeline = Pipeline([ ("scaler", StandardScaler()), ("classifier", LogisticRegression()) ]) # Fit the pipeline to the data pipeline.fit(X, y) # Predict using the pipeline predictions = pipeline.predict(X) print(predictions[:5])
copy

When you use a pipeline, each step is executed in the order you defined. In the example above, the data first passes through the StandardScaler, which standardizes features by removing the mean and scaling to unit variance. The output of the scaler is then passed directly to the LogisticRegression classifier. By calling fit on the pipeline, both the scaler and the classifier are trained sequentially: fit_transform is called on the scaler, and then fit is called on the classifier using the transformed data. Similarly, when you call predict, the input is automatically transformed by the scaler before being passed to the classifier for prediction. This ordered execution ensures that your preprocessing and modeling steps are always applied in a consistent and reproducible way.

question mark

Which statement best describes the purpose of a scikit-learn Pipeline?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 1
some-alt