OrdinalEncoder
The next problem we will solve is categorical data. Recall that there are two types of categorical data.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/nominal_ordinal.gif)
Ordinal data follows some natural order, while nominal does not.
Since there is a natural order, we can encode categories to the numbers in that order.
For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values like:
- 'Terrible' – 0;
- 'Bad' – 1;
- 'OK' – 2;
- 'Good' – 3;
- 'Great' – 4.
To encode ordinal data, OrdinalEncoder
is used. It just encodes the categories to 0, 1, 2, ...
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/OrdinalEncoderClass.png)
Here is an image showing how it works.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/Ordinal.png)
OrdinalEncoder
is easy to use like any other transformer. The only difficulty is to specify the categories
argument correctly.
Let's look at the example of use. We have a dataset (not the Penguins dataset) with the 'education'
column. Let's look at its unique values.
We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.
Note
OrdinalEncoder
is mostly used to transform the features (X
variable). And theX
variable usually is a DataFrame containing more than 1 column.
Because of that, thecategories
argument allows specifying the list of categories for each column, e.g.,categories=[col1_categories, col2_categories]
.
And if you want to transform only 1 column, you should still pass a list containing another list, e.g.,categories=[col1_categories]
.
That's also the reason the.fit_transform()
method expects the DataFrame and doesn't work with Series, so you need to passdf[['column']]
to transform only one column.
Все було зрозуміло?
Зміст курсу
ML Introduction with scikit-learn
1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
OrdinalEncoder
The next problem we will solve is categorical data. Recall that there are two types of categorical data.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/nominal_ordinal.gif)
Ordinal data follows some natural order, while nominal does not.
Since there is a natural order, we can encode categories to the numbers in that order.
For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values like:
- 'Terrible' – 0;
- 'Bad' – 1;
- 'OK' – 2;
- 'Good' – 3;
- 'Great' – 4.
To encode ordinal data, OrdinalEncoder
is used. It just encodes the categories to 0, 1, 2, ...
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/OrdinalEncoderClass.png)
Here is an image showing how it works.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/Ordinal.png)
OrdinalEncoder
is easy to use like any other transformer. The only difficulty is to specify the categories
argument correctly.
Let's look at the example of use. We have a dataset (not the Penguins dataset) with the 'education'
column. Let's look at its unique values.
We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.
Note
OrdinalEncoder
is mostly used to transform the features (X
variable). And theX
variable usually is a DataFrame containing more than 1 column.
Because of that, thecategories
argument allows specifying the list of categories for each column, e.g.,categories=[col1_categories, col2_categories]
.
And if you want to transform only 1 column, you should still pass a list containing another list, e.g.,categories=[col1_categories]
.
That's also the reason the.fit_transform()
method expects the DataFrame and doesn't work with Series, so you need to passdf[['column']]
to transform only one column.
Все було зрозуміло?