Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Encoding Categorical Variables | Preprocessing Data with Scikit-learn
Quizzes & Challenges
Quizzes
Challenges
/
Introduction to Machine Learning with Python

bookChallenge: Encoding Categorical Variables

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, the penguins dataset (without missing values) is provided. All categorical features, including the target ('species' column), must be encoded.

Here is a reminder of the dataset structure:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

You are given a DataFrame named df that contains penguin data.
Your task is to encode all categorical features so that the data can be used in a machine learning model.

  1. Import the OneHotEncoder and LabelEncoder classes from sklearn.preprocessing.
  2. Separate the feature matrix X and the target variable y from the DataFrame.
  3. Create a OneHotEncoder object and apply it to the 'island' and 'sex' columns in X.
  4. Replace the original categorical columns with the encoded ones.
  5. Create a LabelEncoder object and apply it to the 'species' column to encode the target variable y.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Encoding Categorical Variables

Swipe to show menu

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, the penguins dataset (without missing values) is provided. All categorical features, including the target ('species' column), must be encoded.

Here is a reminder of the dataset structure:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

You are given a DataFrame named df that contains penguin data.
Your task is to encode all categorical features so that the data can be used in a machine learning model.

  1. Import the OneHotEncoder and LabelEncoder classes from sklearn.preprocessing.
  2. Separate the feature matrix X and the target variable y from the DataFrame.
  3. Create a OneHotEncoder object and apply it to the 'island' and 'sex' columns in X.
  4. Replace the original categorical columns with the encoded ones.
  5. Create a LabelEncoder object and apply it to the 'species' column to encode the target variable y.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
single

single

some-alt