Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Challenge: Creating a Complete ML Pipeline | Pipelines
ML Introduction with scikit-learn

book
Challenge: Creating a Complete ML Pipeline

Now let's create a proper pipeline with the final estimator. As a result, we will get a trained prediction pipeline that can be used for predicting new instances simply by calling the .predict() method.

To train a predictor (model), you need y to be encoded. This is done separately from the pipeline we build for X. Remember that LabelEncoder is used for encoding the target.

Tarea

Swipe to start coding

You have the same penguins dataset. The task is to build a pipeline with KNeighborsClassifier as a final estimator, train it, and predict values for the X itself.

  1. Encode the y variable.
  2. Create a pipeline containing ct, SimpleImputer, StandardScaler, and KNeighborsClassifier.
  3. Train the pipe object using the features X and the target y.

Solución

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct,
SimpleImputer(strategy='most_frequent'),
StandardScaler(),
KNeighborsClassifier()
)
# Train the model
pipe.fit(X, y)
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 6
single

single

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = ___
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = ___(___, ___(___=___), ___(), __())
# Train the model
___
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt