Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Creating a Pipeline | Pipelines
ML Introduction with scikit-learn

bookChallenge: Creating a Pipeline

In this challenge, combine all preprocessing steps into a single pipeline using the original penguins.csv dataset.

  1. Remove the two rows with insufficient data.
  2. Build a pipeline that includes encoding, imputing, and scaling.

You need to encode only two columns, 'sex' and 'island'. Since you do not want to encode the entire X, you must use a ColumnTransformer. Afterward, apply the SimpleImputer and StandardScaler to the entire X.

Here is a reminder of the make_column_transformer() and make_pipeline() functions you will use.

Task

Swipe to start coding

You are given a DataFrame named df that contains penguin data. Your goal is to build a preprocessing pipeline that handles missing values, encodes categorical columns, and scales numerical features.

  1. Import the make_pipeline function from sklearn.pipeline.
  2. Create a ColumnTransformer named ct that applies a OneHotEncoder to the 'sex' and 'island' columns while keeping all other columns unchanged (remainder='passthrough').
  3. Create a pipeline that includes the following steps in order:
    • The ColumnTransformer you defined (ct);
    • A SimpleImputer with the strategy set to 'most_frequent';
    • A StandardScaler for feature scaling.
  4. Apply the pipeline to the feature matrix X and store the transformed data in a variable named X_transformed.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you show me how to build the pipeline step by step?

What should I use for encoding the 'sex' and 'island' columns?

How do I remove the two rows with insufficient data?

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Creating a Pipeline

Swipe to show menu

In this challenge, combine all preprocessing steps into a single pipeline using the original penguins.csv dataset.

  1. Remove the two rows with insufficient data.
  2. Build a pipeline that includes encoding, imputing, and scaling.

You need to encode only two columns, 'sex' and 'island'. Since you do not want to encode the entire X, you must use a ColumnTransformer. Afterward, apply the SimpleImputer and StandardScaler to the entire X.

Here is a reminder of the make_column_transformer() and make_pipeline() functions you will use.

Task

Swipe to start coding

You are given a DataFrame named df that contains penguin data. Your goal is to build a preprocessing pipeline that handles missing values, encodes categorical columns, and scales numerical features.

  1. Import the make_pipeline function from sklearn.pipeline.
  2. Create a ColumnTransformer named ct that applies a OneHotEncoder to the 'sex' and 'island' columns while keeping all other columns unchanged (remainder='passthrough').
  3. Create a pipeline that includes the following steps in order:
    • The ColumnTransformer you defined (ct);
    • A SimpleImputer with the strategy set to 'most_frequent';
    • A StandardScaler for feature scaling.
  4. Apply the pipeline to the feature matrix X and store the transformed data in a variable named X_transformed.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

some-alt