Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ ColumnTransformer | Section
Foundations of Machine Learning

bookColumnTransformer

メニューを表示するにはスワイプしてください

When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders — for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones. ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.

make_column_transformer accepts tuples of (transformer, [columns]). For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':

ct = make_column_transformer(
   (OrdinalEncoder(), ['education']),
   (OneHotEncoder(), ['gender']),
   remainder='passthrough'
)
Note
Note

remainder controls what happens to unspecified columns. Default: 'drop'. To keep all other columns unchanged, set remainder='passthrough'.

For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.

12345678910111213
from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().

question mark

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  18

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  18
some-alt