Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Preprocessing Pipeline | Feature Engineering for Machine Learning
Data Preprocessing and Feature Engineering

bookChallenge: Preprocessing Pipeline

Task

Swipe to start coding

You are given the Titanic dataset from the seaborn library. Your task is to build a complete preprocessing pipeline that performs all essential data transformations used before machine learning.

Follow these steps:

  1. Load the dataset using sns.load_dataset("titanic").
  2. Handle missing values:
    • Numeric columns β†’ fill with mean.
    • Categorical columns β†’ fill with mode.
  3. Encode the categorical features sex and embarked using pd.get_dummies().
  4. Scale numeric columns age and fare using StandardScaler.
  5. Create a new feature family_size = sibsp + parch + 1.
  6. Combine all transformations into a function called preprocess_titanic(data) that returns the final processed DataFrame.
  7. Assign the processed dataset to a variable called processed_data.

Print the first 5 rows of the final DataFrame.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain that in simpler terms?

What are the main benefits of this approach?

Are there any common mistakes to avoid with this?

close

Awesome!

Completion rate improved to 8.33

bookChallenge: Preprocessing Pipeline

Swipe to show menu

Task

Swipe to start coding

You are given the Titanic dataset from the seaborn library. Your task is to build a complete preprocessing pipeline that performs all essential data transformations used before machine learning.

Follow these steps:

  1. Load the dataset using sns.load_dataset("titanic").
  2. Handle missing values:
    • Numeric columns β†’ fill with mean.
    • Categorical columns β†’ fill with mode.
  3. Encode the categorical features sex and embarked using pd.get_dummies().
  4. Scale numeric columns age and fare using StandardScaler.
  5. Create a new feature family_size = sibsp + parch + 1.
  6. Combine all transformations into a function called preprocess_titanic(data) that returns the final processed DataFrame.
  7. Assign the processed dataset to a variable called processed_data.

Print the first 5 rows of the final DataFrame.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

some-alt