ColumnTransformer
When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones.
ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.
make_column_transformer accepts tuples of (transformer, [columns]).
For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']),
remainder='passthrough'
)
remainder controls what happens to unspecified columns.
Default: 'drop'.
To keep all other columns unchanged, set remainder='passthrough'.
For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.
12345678910111213from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain what the `remainder='passthrough'` argument does?
How do I know which columns are nominal or ordinal in my own dataset?
Can you show how to use the transformed data in a machine learning model?
Awesome!
Completion rate improved to 3.13
ColumnTransformer
Swipe to show menu
When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones.
ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.
make_column_transformer accepts tuples of (transformer, [columns]).
For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']),
remainder='passthrough'
)
remainder controls what happens to unspecified columns.
Default: 'drop'.
To keep all other columns unchanged, set remainder='passthrough'.
For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.
12345678910111213from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().
Thanks for your feedback!