ColumnTransformer
Looking ahead, when you invoke the .fit_transform(X)
method on a Pipeline
object, it applies each transformer to the entire set of features in X
. However, this behavior may not always be desired.
For instance, you might not want to encode numerical values or you may need to apply different transformers to specific columns — such as using OrdinalEncoder
for ordinal features and OneHotEncoder
for nominal features.
The ColumnTransformer
resolves this issue by allowing each column to be treated separately. To create a ColumnTransformer
, you can utilize the make_column_transformer
function from the sklearn.compose
module.
The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
For example, we can create a ColumnTransformer
that applies the OrdinalEncoder
only to the 'education'
column and the OneHotEncoder
only to the 'gender'
column.
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']), remainder='passthrough'
)
For example, we will use an exams.csv
file containing nominal columns ('gender'
, 'race/ethnicity'
, 'lunch'
, 'test preparation course'
). It also contains an ordinal column, 'parental level of education'
.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
With the help of ColumnTransformer
, we can simultaneously transform nominal data using OneHotEncoder
and ordinal data using OrdinalEncoder
in a single step.
123456789101112131415import pandas as pd from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') # Ordered categories of parental level of education for OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] # Making a column transformer ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
"As you might expect, the ColumnTransformer
is a transformer, so it includes all the necessary methods for a transformer, such as .fit()
, .fit_transform()
, and .transform()
.
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Awesome!
Completion rate improved to 3.13
ColumnTransformer
Swipe um das Menü anzuzeigen
Looking ahead, when you invoke the .fit_transform(X)
method on a Pipeline
object, it applies each transformer to the entire set of features in X
. However, this behavior may not always be desired.
For instance, you might not want to encode numerical values or you may need to apply different transformers to specific columns — such as using OrdinalEncoder
for ordinal features and OneHotEncoder
for nominal features.
The ColumnTransformer
resolves this issue by allowing each column to be treated separately. To create a ColumnTransformer
, you can utilize the make_column_transformer
function from the sklearn.compose
module.
The function takes as arguments tuples with the transformer and the list of columns to which this transformer should be applied.
For example, we can create a ColumnTransformer
that applies the OrdinalEncoder
only to the 'education'
column and the OneHotEncoder
only to the 'gender'
column.
ct = make_column_transformer(
(OrdinalEncoder(), ['education']),
(OneHotEncoder(), ['gender']), remainder='passthrough'
)
For example, we will use an exams.csv
file containing nominal columns ('gender'
, 'race/ethnicity'
, 'lunch'
, 'test preparation course'
). It also contains an ordinal column, 'parental level of education'
.
12345import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') print(df.head())
With the help of ColumnTransformer
, we can simultaneously transform nominal data using OneHotEncoder
and ordinal data using OrdinalEncoder
in a single step.
123456789101112131415import pandas as pd from sklearn.compose import make_column_transformer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/exams.csv') # Ordered categories of parental level of education for OrdinalEncoder edu_categories = ['high school', 'some high school', 'some college', "associate's degree", "bachelor's degree", "master's degree"] # Making a column transformer ct = make_column_transformer( (OrdinalEncoder(categories=[edu_categories]), ['parental level of education']), (OneHotEncoder(), ['gender', 'race/ethnicity', 'lunch', 'test preparation course']), remainder='passthrough' ) print(ct.fit_transform(df))
"As you might expect, the ColumnTransformer
is a transformer, so it includes all the necessary methods for a transformer, such as .fit()
, .fit_transform()
, and .transform()
.
Danke für Ihr Feedback!