Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
One-Hot Encoder | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

One-Hot EncoderOne-Hot Encoder

The story with nominal values is a little more complicated.

Let's say the feature contains ordinal data, for example, user rates. Its values are Terrible, Bad, OK, Good, and Great. Encoding those values as 0 to 4 seems logical. And the ML model will consider the order. Now, imagine the feature 'city' containing five different cities. If we encode them from 0 to 4, ML mode will also think there is a logical order, but there is none. So we need to use a little more complex encoding called One-Hot Encoding.

To encode nominal data, the OneHotEncoder transformer is used. It creates a column for each unique value. Then for each row, it sets 1 to the column of this row's value and 0 to other columns.

Here is an image showing how it works.

What was originally 'NewYork' now has 1 in a City_NewYork column and 0 in other City_ columns.

Let's use OneHotEncoder on our Penguins dataset! There are two nominal features, 'island' and 'sex' (not counting 'species', we will learn how to deal with target encoding in the next chapter).

To use OneHotEncoder, you just need to initialize an object and pass columns to the .fit_transform() like with any other transformer.

Note

OneHotEncoder returns a sparse matrix, which can be converted to a NumPy array using the .toarray() method, as shown above.
You will not have to do this once you learn pipelines.

OneHotEncoder creates new columns. Is this correct?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 2. Розділ 6
course content

Зміст курсу

ML Introduction with scikit-learn

One-Hot EncoderOne-Hot Encoder

The story with nominal values is a little more complicated.

Let's say the feature contains ordinal data, for example, user rates. Its values are Terrible, Bad, OK, Good, and Great. Encoding those values as 0 to 4 seems logical. And the ML model will consider the order. Now, imagine the feature 'city' containing five different cities. If we encode them from 0 to 4, ML mode will also think there is a logical order, but there is none. So we need to use a little more complex encoding called One-Hot Encoding.

To encode nominal data, the OneHotEncoder transformer is used. It creates a column for each unique value. Then for each row, it sets 1 to the column of this row's value and 0 to other columns.

Here is an image showing how it works.

What was originally 'NewYork' now has 1 in a City_NewYork column and 0 in other City_ columns.

Let's use OneHotEncoder on our Penguins dataset! There are two nominal features, 'island' and 'sex' (not counting 'species', we will learn how to deal with target encoding in the next chapter).

To use OneHotEncoder, you just need to initialize an object and pass columns to the .fit_transform() like with any other transformer.

Note

OneHotEncoder returns a sparse matrix, which can be converted to a NumPy array using the .toarray() method, as shown above.
You will not have to do this once you learn pipelines.

OneHotEncoder creates new columns. Is this correct?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 2. Розділ 6
some-alt