Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
OrdinalEncoder | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

OrdinalEncoderOrdinalEncoder

The next problem we will solve is categorical data. Recall that there are two types of categorical data.

Ordinal data follows some natural order, while nominal does not.
Since there is a natural order, we can encode categories to the numbers in that order.
For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values like:

  • 'Terrible' – 0;
  • 'Bad' – 1;
  • 'OK' – 2;
  • 'Good' – 3;
  • 'Great' – 4.

To encode ordinal data, OrdinalEncoder is used. It just encodes the categories to 0, 1, 2, ...

Here is an image showing how it works.

OrdinalEncoder is easy to use like any other transformer. The only difficulty is to specify the categories argument correctly.
Let's look at the example of use. We have a dataset (not the Penguins dataset) with the 'education' column. Let's look at its unique values.

We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.

Note

OrdinalEncoder is mostly used to transform the features (X variable). And the X variable usually is a DataFrame containing more than 1 column.
Because of that, the categories argument allows specifying the list of categories for each column, e.g., categories=[col1_categories, col2_categories].
And if you want to transform only 1 column, you should still pass a list containing another list, e.g., categories=[col1_categories].
That's also the reason the .fit_transform() method expects the DataFrame and doesn't work with Series, so you need to pass df[['column']] to transform only one column.

Все було зрозуміло?

Секція 2. Розділ 5
course content

Зміст курсу

ML Introduction with scikit-learn

OrdinalEncoderOrdinalEncoder

The next problem we will solve is categorical data. Recall that there are two types of categorical data.

Ordinal data follows some natural order, while nominal does not.
Since there is a natural order, we can encode categories to the numbers in that order.
For example, we would encode the 'rate' column containing 'Terrible', 'Bad', 'OK', 'Good', and 'Great' values like:

  • 'Terrible' – 0;
  • 'Bad' – 1;
  • 'OK' – 2;
  • 'Good' – 3;
  • 'Great' – 4.

To encode ordinal data, OrdinalEncoder is used. It just encodes the categories to 0, 1, 2, ...

Here is an image showing how it works.

OrdinalEncoder is easy to use like any other transformer. The only difficulty is to specify the categories argument correctly.
Let's look at the example of use. We have a dataset (not the Penguins dataset) with the 'education' column. Let's look at its unique values.

We need to create a list of ordered categorical values, in this case, from 'HS-grad' to 'Doctorate'.

Note

OrdinalEncoder is mostly used to transform the features (X variable). And the X variable usually is a DataFrame containing more than 1 column.
Because of that, the categories argument allows specifying the list of categories for each column, e.g., categories=[col1_categories, col2_categories].
And if you want to transform only 1 column, you should still pass a list containing another list, e.g., categories=[col1_categories].
That's also the reason the .fit_transform() method expects the DataFrame and doesn't work with Series, so you need to pass df[['column']] to transform only one column.

Все було зрозуміло?

Секція 2. Розділ 5
some-alt