Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
ColumnTransformer | Pipelines
course content

Зміст курсу

ML Introduction with scikit-learn

ColumnTransformerColumnTransformer

Jumping ahead, when we call the .fit_transform(X) method on the Pipeline object, it will apply each transformer on the whole X.
But that is not the behavior we want.
We do not want to encode already numerical values, or we may want to apply different transformers to different columns (e.g., OrdinalEncoder for ordinal features and OneHotEncoder for nominal).

The ColumnTransformer transformer addresses this problem. It allows us to treat each column separately.
To create a ColumnTransformer, you can use a special function make_column_transformer from the sklearn.compose module.

The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
Here is an example:

Notice the remainder argument in the end. It specifies what to do with columns not mentioned in a make_column_transformer (here only 'gender' and 'education' are mentioned).
By default, it is set to 'drop', which means they will be dropped.
You need to set the remainder='passthrough' to pass other columns untouched.

For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course').
It also contains an ordinal column, 'parental level of education'.

With the help of ColumnTransformer, we will transform nominal data using OneHotEncoder and ordinal using OrdinalEncoder at one step.

As you may have guessed, ColumnTransformer is a transformer, so it has all the methods needed for a transformer (.fit(), .fit_transform(), .transform())

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 2
course content

Зміст курсу

ML Introduction with scikit-learn

ColumnTransformerColumnTransformer

Jumping ahead, when we call the .fit_transform(X) method on the Pipeline object, it will apply each transformer on the whole X.
But that is not the behavior we want.
We do not want to encode already numerical values, or we may want to apply different transformers to different columns (e.g., OrdinalEncoder for ordinal features and OneHotEncoder for nominal).

The ColumnTransformer transformer addresses this problem. It allows us to treat each column separately.
To create a ColumnTransformer, you can use a special function make_column_transformer from the sklearn.compose module.

The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
Here is an example:

Notice the remainder argument in the end. It specifies what to do with columns not mentioned in a make_column_transformer (here only 'gender' and 'education' are mentioned).
By default, it is set to 'drop', which means they will be dropped.
You need to set the remainder='passthrough' to pass other columns untouched.

For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course').
It also contains an ordinal column, 'parental level of education'.

With the help of ColumnTransformer, we will transform nominal data using OneHotEncoder and ordinal using OrdinalEncoder at one step.

As you may have guessed, ColumnTransformer is a transformer, so it has all the methods needed for a transformer (.fit(), .fit_transform(), .transform())

Suppose you have a dataset with features 'education', 'income', 'job'. What will happen with the 'income' column after running the following code? (Notice that the remainder argument is not specified)

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 2
some-alt