Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn ColumnTransformer | Pipelines
Quizzes & Challenges
Quizzes
Challenges
/
Introduction to Machine Learning with Python

bookColumnTransformer

When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β€” for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones. ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.

make_column_transformer accepts tuples of (transformer, [columns]). For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':

ct = make_column_transformer(
   (OrdinalEncoder(), ['education']),
   (OneHotEncoder(), ['gender']),
   remainder='passthrough'
)
Note
Note

remainder controls what happens to unspecified columns. Default: 'drop'. To keep all other columns unchanged, set remainder='passthrough'.

For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.

12345678910111213
from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().

question mark

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.13

bookColumnTransformer

Swipe to show menu

When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders β€” for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones. ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.

make_column_transformer accepts tuples of (transformer, [columns]). For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':

ct = make_column_transformer(
   (OrdinalEncoder(), ['education']),
   (OneHotEncoder(), ['gender']),
   remainder='passthrough'
)
Note
Note

remainder controls what happens to unspecified columns. Default: 'drop'. To keep all other columns unchanged, set remainder='passthrough'.

For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.

12345678910111213
from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().

question mark

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2
some-alt