Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ One-Hot Encoder | Section
Foundations of Machine Learning

bookOne-Hot Encoder

メニューを表示するにはスワイプしてください

When it comes to nominal values, handling them is a bit more complex.

For ordinal data, such as user ratings ranging from 'Terrible' to 'Great', encoding them as numbers from 0 to 4 is appropriate because the model can capture the inherent order.

In contrast, for a feature like 'city' with five distinct categories, encoding them as numbers from 0 to 4 would incorrectly suggest an order. In this case, one-hot encoding is a better choice, as it represents categories without implying a hierarchy.

To encode nominal data, the OneHotEncoder transformer is used. It creates a column for each unique value. Then for each row, it sets 1 to the column of this row's value and 0 to other columns.

What was originally 'NewYork' now has 1 in the 'City_NewYork' column and 0 in other City_ columns.

Apply OneHotEncoder to the penguins dataset. The nominal features are 'island' and 'sex'. The 'species' column is the target and will be handled separately when discussing target encoding in the next chapter.

123456
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print('island: ', df['island'].unique()) print('sex: ', df['sex'].unique())
copy

To apply OneHotEncoder, initialize the encoder object and pass the selected columns to .fit_transform(), in the same way as with other transformers.

1234567891011
import pandas as pd from sklearn.preprocessing import OneHotEncoder df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') # Assign X, y variables y = df['species'] X = df.drop('species', axis=1) # Initialize an OneHotEncoder object one_hot = OneHotEncoder() # Print transformed 'sex', 'island' columns print(one_hot.fit_transform(X[['sex', 'island']]).toarray())
copy
Note
Note

The .toarray() method converts the sparse matrix output from the OneHotEncoder into a dense NumPy array. Dense arrays display all values explicitly, making visualization and manipulation of the encoded data within a DataFrame easier. Sparse matrices store only non-zero elements, optimizing memory use. You can omit this method to see the difference in output.

question mark

OneHotEncoder creates new columns. Is this correct?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  11

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  11
some-alt