ColumnTransformer
Jumping ahead, when we call the .fit_transform(X)
method on the Pipeline
object, it will apply each transformer on the whole X
.
But that is not the behavior we want.
We do not want to encode already numerical values, or we may want to apply different transformers to different columns (e.g., OrdinalEncoder
for ordinal features and OneHotEncoder
for nominal).
The ColumnTransformer
transformer addresses this problem. It allows us to treat each column separately.
To create a ColumnTransformer
, you can use a special function make_column_transformer
from the sklearn.compose
module.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/MakeTransFunc.png)
The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
Here is an example:
Notice the remainder
argument in the end. It specifies what to do with columns not mentioned in a make_column_transformer
(here only 'gender' and 'education' are mentioned).
By default, it is set to 'drop'
, which means they will be dropped.
You need to set the remainder='passthrough'
to pass other columns untouched.
For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course').
It also contains an ordinal column, 'parental level of education'.
With the help of ColumnTransformer
, we will transform nominal data using OneHotEncoder
and ordinal using OrdinalEncoder
at one step.
As you may have guessed, ColumnTransformer
is a transformer, so it has all the methods needed for a transformer (.fit()
, .fit_transform()
, .transform()
)
Everything was clear?
Course Content
ML Introduction with scikit-learn
1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
ColumnTransformer
Jumping ahead, when we call the .fit_transform(X)
method on the Pipeline
object, it will apply each transformer on the whole X
.
But that is not the behavior we want.
We do not want to encode already numerical values, or we may want to apply different transformers to different columns (e.g., OrdinalEncoder
for ordinal features and OneHotEncoder
for nominal).
The ColumnTransformer
transformer addresses this problem. It allows us to treat each column separately.
To create a ColumnTransformer
, you can use a special function make_column_transformer
from the sklearn.compose
module.
![](https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/MakeTransFunc.png)
The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
Here is an example:
Notice the remainder
argument in the end. It specifies what to do with columns not mentioned in a make_column_transformer
(here only 'gender' and 'education' are mentioned).
By default, it is set to 'drop'
, which means they will be dropped.
You need to set the remainder='passthrough'
to pass other columns untouched.
For example, we will use an exams.csv file containing nominal columns ('gender', 'race/ethnicity', 'lunch', 'test preparation course').
It also contains an ordinal column, 'parental level of education'.
With the help of ColumnTransformer
, we will transform nominal data using OneHotEncoder
and ordinal using OrdinalEncoder
at one step.
As you may have guessed, ColumnTransformer
is a transformer, so it has all the methods needed for a transformer (.fit()
, .fit_transform()
, .transform()
)
Everything was clear?