Challenge: Creating a Pipeline
In this challenge, you need to put all the preprocessing steps we did together into one pipeline. The dataset is the initial penguins.csv
file we started from.
The first step is to remove two useless rows. Then you will have to create a pipeline containing encoding, imputing, and scaling.
You need to encode only two columns, 'sex'
and 'island'
. Since you do not want to encode the entire X
, you must use a ColumnTransformer
. Afterward, apply the SimpleImputer
and StandardScaler
to the entire X
.
Here is a reminder of the make_column_transformer()
and make_pipeline()
functions you will use.
Task
Swipe to start coding
- Import the correct function for creating a pipeline.
- Make a
ColumnTransformer
with theOneHotEncoder
applied only to columns'sex'
and'island'
. - Make sure that all other columns remain untouched.
- Create a pipeline containing
ct
you just created,SimpleImputer
that fills in missing values with the most frequent value and aStandardScaler
as a last step. - Transform the
X
using thepipe
you created.
Solution
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Create the ColumnTransformer for encoding
ct = make_column_transformer((OneHotEncoder(), ['sex', 'island']), remainder='passthrough')
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = make_pipeline(ct, SimpleImputer(strategy='most_frequent'), StandardScaler())
# Transform X using the pipeline and print transformed X
X_transformed = pipe.fit_transform(X)
print(X_transformed)
Everything was clear?
Thanks for your feedback!
Section 3. Chapter 4
single
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import make_column_transformer
from ___ import ___
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv')
# Removing rows with more than 1 null
df = df[df.isna().sum(axis=1) < 2]
# Assigining X, y variables
X, y = df.drop('species', axis=1), df['species']
# Create the ColumnTransformer for encoding
ct = ___((___(), [___]), ___=___)
# Make a Pipeline of ct, SimpleImputer, and StandardScaler
pipe = ___(___, ___(___=___), ___())
# Transform X using the pipeline and print transformed X
X_transformed = ___
print(X_transformed)
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat