Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Create a Complete ML Pipeline | Pipelines
ML Introduction with scikit-learn

book
Create a Complete ML Pipeline

Now let's build a proper pipeline with the final estimator. As a result, we will get a trained prediction pipeline that can be used for predicting new instances simply by calling the .predict() method.
To train a predictor (model), you need y to be encoded. This is done separately from the Pipeline we build for X.
Remember that LabelEncoder is used for encoding the target.

You have the same Penguins dataset.
The task is to build a pipeline with KNeighborsClassifier as a final estimator, train it, and predict values for the X itself.
Since the predictions will be encoded (0, 1, or 2), we will use the .inverse_transform() method of LabelEncoder to get the predictions back to 'Adelie', 'Chinstrap', or 'Gentoo'.

Aufgabe

Swipe to start coding

  1. Encode the y variable using the LabelEncoder.
  2. Make a pipeline containing ct, SimpleImputer, StandardScaler, and KNeighborsClassifier (in that order).
  3. Train the pipe object.

Lösung

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct,
SimpleImputer(strategy='most_frequent'),
StandardScaler(),
KNeighborsClassifier()
)
# Train the model
pipe.fit(X, y)
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 6
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = label_enc.___(___)
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct,
___(strategy='most_frequent'),
___(),
___()
)
# Train the model
pipe.___(X, y)
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print
toggle bottom row
some-alt