Challenge: Creating a Pipeline
In this challenge, combine all preprocessing steps into a single pipeline using the original penguins.csv dataset.
- Remove the two rows with insufficient data.
- Build a pipeline that includes encoding, imputing, and scaling.
You need to encode only two columns, 'sex' and 'island'. Since you do not want to encode the entire X, you must use a ColumnTransformer. Afterward, apply the SimpleImputer and StandardScaler to the entire X.
Here is a reminder of the make_column_transformer() and make_pipeline() functions you will use.
Swipe to start coding
You are given a DataFrame named df that contains penguin data.
Your goal is to build a preprocessing pipeline that handles missing values, encodes categorical columns, and scales numerical features.
- Import the
make_pipelinefunction fromsklearn.pipeline. - Create a
ColumnTransformernamedctthat applies aOneHotEncoderto the'sex'and'island'columns while keeping all other columns unchanged (remainder='passthrough'). - Create a pipeline that includes the following steps in order:
- The
ColumnTransformeryou defined (ct); - A
SimpleImputerwith the strategy set to'most_frequent'; - A
StandardScalerfor feature scaling.
- The
- Apply the pipeline to the feature matrix
Xand store the transformed data in a variable namedX_transformed.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you show me how to build the pipeline step by step?
What should I use for encoding the 'sex' and 'island' columns?
How do I remove the two rows with insufficient data?
Awesome!
Completion rate improved to 3.13
Challenge: Creating a Pipeline
Swipe to show menu
In this challenge, combine all preprocessing steps into a single pipeline using the original penguins.csv dataset.
- Remove the two rows with insufficient data.
- Build a pipeline that includes encoding, imputing, and scaling.
You need to encode only two columns, 'sex' and 'island'. Since you do not want to encode the entire X, you must use a ColumnTransformer. Afterward, apply the SimpleImputer and StandardScaler to the entire X.
Here is a reminder of the make_column_transformer() and make_pipeline() functions you will use.
Swipe to start coding
You are given a DataFrame named df that contains penguin data.
Your goal is to build a preprocessing pipeline that handles missing values, encodes categorical columns, and scales numerical features.
- Import the
make_pipelinefunction fromsklearn.pipeline. - Create a
ColumnTransformernamedctthat applies aOneHotEncoderto the'sex'and'island'columns while keeping all other columns unchanged (remainder='passthrough'). - Create a pipeline that includes the following steps in order:
- The
ColumnTransformeryou defined (ct); - A
SimpleImputerwith the strategy set to'most_frequent'; - A
StandardScalerfor feature scaling.
- The
- Apply the pipeline to the feature matrix
Xand store the transformed data in a variable namedX_transformed.
Solution
Thanks for your feedback!
single