Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Challenge: Creating a Pipeline | Pipelines
ML Introduction with scikit-learn

book
Challenge: Creating a Pipeline

In this challenge, you need to put all the preprocessing steps we did together into one pipeline. The dataset is the initial penguins.csv file we started from.

The first step is to remove two useless rows. Then you will have to create a pipeline containing encoding, imputing, and scaling.

You need to encode only two columns, 'sex' and 'island'. Since you do not want to encode the entire X, you must use a ColumnTransformer. Afterward, apply the SimpleImputer and StandardScaler to the entire X.

Here is a reminder of the make_column_transformer() and make_pipeline() functions you will use.

Tarefa

Swipe to start coding

  1. Import the correct function for creating a pipeline.
  2. Make a ColumnTransformer with the OneHotEncoder applied only to columns 'sex' and 'island'.
  3. Make sure that all other columns remain untouched.
  4. Create a pipeline containing ct you just created, SimpleImputer that fills in missing values with the most frequent value and a StandardScaler as a last step.
  5. Transform the X using the pipe you created.

Solução

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Create the ColumnTransformer for encoding
ct = make_column_transformer((OneHotEncoder(), ['sex', 'island']), remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct, SimpleImputer(strategy='most_frequent'), StandardScaler())
# Transform X using the pipeline and print transformed X
X_transformed = pipe.fit_transform(X)
print(X_transformed)
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 4
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from ___ import ___

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Create the ColumnTransformer for encoding
ct = ___((___(), [___]), ___=___)
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = ___(___, ___(___=___), ___())
# Transform X using the pipeline and print transformed X
X_transformed = ___
print(X_transformed)
toggle bottom row
some-alt