Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære ColumnTransformer | Section
Machine Learning Foundations with Scikit-Learn

bookColumnTransformer

Sveip for å vise menyen

When calling .fit_transform(X) on a Pipeline, each transformer is applied to all columns, which is not always desirable. Some columns may require different encoders — for example, OrdinalEncoder for ordinal features and OneHotEncoder for nominal ones. ColumnTransformer solves this by letting you assign different transformers to specific columns using make_column_transformer.

make_column_transformer accepts tuples of (transformer, [columns]). For example, applying OrdinalEncoder to 'education' and OneHotEncoder to 'gender':

ct = make_column_transformer(
   (OrdinalEncoder(), ['education']),
   (OneHotEncoder(), ['gender']),
   remainder='passthrough'
)
Note
Note

remainder controls what happens to unspecified columns. Default: 'drop'. To keep all other columns unchanged, set remainder='passthrough'.

For example, consider the exams.csv file. It contains several nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course') and one ordinal column, 'parental level of education'.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
copy

Using ColumnTransformer, nominal data can be transformed with OneHotEncoder and ordinal data with OrdinalEncoder in a single step.

12345678910111213
from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
copy

The ColumnTransformer is itself a transformer, so it provides the standard methods .fit(), .fit_transform(), and .transform().

question mark

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 18

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 18
some-alt