Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Creating a Complete ML Pipeline | Pipelines
ML Introduction with scikit-learn

book
Challenge: Creating a Complete ML Pipeline

Now let's create a proper pipeline with the final estimator. As a result, we will get a trained prediction pipeline that can be used for predicting new instances simply by calling the .predict() method.

To train a predictor (model), you need y to be encoded. This is done separately from the pipeline we build for X. Remember that LabelEncoder is used for encoding the target.

Oppgave

Swipe to start coding

You have the same penguins dataset. The task is to build a pipeline with KNeighborsClassifier as a final estimator, train it, and predict values for the X itself.

  1. Encode the y variable.
  2. Create a pipeline containing ct, SimpleImputer, StandardScaler, and KNeighborsClassifier.
  3. Train the pipe object using the features X and the target y.

Løsning

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = label_enc.fit_transform(y)
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct,
SimpleImputer(strategy='most_frequent'),
StandardScaler(),
KNeighborsClassifier()
)
# Train the model
pipe.fit(X, y)
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 6
single

single

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Encode the target
label_enc = LabelEncoder()
y = ___
# Create the ColumnTransformer for encoding features
ct = make_column_transformer((OneHotEncoder(), ['island', 'sex']),
remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = ___(___, ___(___=___), ___(), __())
# Train the model
___
# Print predictions
y_pred = pipe.predict(X) # Get encoded predictions
print(label_enc.inverse_transform(y_pred)) # Decode predictions and print

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

some-alt