Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære OrdinalEncoder | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

book
OrdinalEncoder

The next problem we will solve is categorical data. Recall that there are two types of categorical data.

Ordinal data follows some natural order, while nominal does not. Since there is a natural order, we can encode categories to the numbers in that order.

For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values as follows:

  • 'Terrible' – 0;

  • 'Bad' – 1;

  • 'OK' – 2;

  • 'Good' – 3;

  • 'Great' – 4.

To encode ordinal data, OrdinalEncoder is used. It just encodes the categories to 0, 1, 2, ... .

OrdinalEncoder is easy to use like any other transformer. The only difficulty is to specify the categories argument correctly.

Let's look at an example of use. We have a dataset (not the penguins dataset) with an 'education' column. Now, let's examine its unique values.

import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')

print(df['education'].unique())
12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') print(df['education'].unique())
copy

We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

# Load the data and assign X, y variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv')
y = df['income'] # 'income' is a target in this dataset
X = df.drop('income', axis=1)
# Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6
edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate']
# Initialize an OrdinalEncoder instance with the correct categories
ord_enc = OrdinalEncoder(categories=[edu_categories])
# Transform the 'education' column and print it
X['education'] = ord_enc.fit_transform(X[['education']])
print(X['education'])
1234567891011121314
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Load the data and assign X, y variables df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') y = df['income'] # 'income' is a target in this dataset X = df.drop('income', axis=1) # Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6 edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate'] # Initialize an OrdinalEncoder instance with the correct categories ord_enc = OrdinalEncoder(categories=[edu_categories]) # Transform the 'education' column and print it X['education'] = ord_enc.fit_transform(X[['education']]) print(X['education'])
copy

If you need to transform multiple features using the OrdinalEncoder, it's important to specify the categories for each column. You can do this using the categories argument as shown below:

python
encoder = OrdinalEncoder(categories=[col1_categories, col2_categories, ...])

1. Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

2. Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

question mark

Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

Select the correct answer

question mark

Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 5

Spør AI

expand
ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

We use cookies to make your experience better!
some-alt