Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda ColumnTransformer | Section
Machine Learning Foundations with Scikit-Learn

bookColumnTransformer

Deslize para mostrar o menu

When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders — for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones. ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.

make_column_transformer accepts tuples of (transformer, [columns]). For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':

ct = make_column_transformer(
   (OrdinalEncoder(), ['education']),
   (OneHotEncoder(), ['gender']),
   remainder='passthrough'
)
Note
Note

remainder controls what happens to unspecified columns. Default: 'drop'. To keep all other columns unchanged, set remainder='passthrough'.

For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.

12345678910111213
from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().

question mark

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Selecione a resposta correta

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 18

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 1. Capítulo 18
some-alt