ColumnTransformer
When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones.
ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.
make_column_transformer accepts tuples of (transformer, [columns]).
For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']),
remainder='passthrough'
)
remainder controls what happens to unspecified columns.
Default: 'drop'.
To keep all other columns unchanged, set remainder='passthrough'.
For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.
12345678910111213from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.13
ColumnTransformer
Swipe to show menu
When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones.
ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.
make_column_transformer accepts tuples of (transformer, [columns]).
For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']),
remainder='passthrough'
)
remainder controls what happens to unspecified columns.
Default: 'drop'.
To keep all other columns unchanged, set remainder='passthrough'.
For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.
12345678910111213from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().
Thanks for your feedback!