Categorical Encoding#

Feature-engine’s categorical encoders replace the categories of the variable with estimated or arbitrary numbers.

Summary of Feature-engine’s encoders characteristics

Transformer

Regression

Classification

Multi-class

Description

OneHotEncoder()

Adds dummy variables to represent each category

OrdinalEncoder()

√ Replaces categories with an integer

CountFreuencyEncoder()

Replaces categories with their count or frequency

MeanEncoder()

x

Replaces categories with the targe mean value

WoEEncoder()

x

x

Replaces categories with the weight of the evidence

DecisionTreeEncoder()

√ Replaces categories with the predictions of a decision tree

RareLabelEncoder()

√ Groups infrequent categories into a single one

Feature-engine’s categorical encoders encode only variables of type categorical or object by default. From version 1.1.0, you have the option to set the parameter ignore_format to True to make the transformers also accept numerical variables as input.

Other categorical encoding libraries#

For additional categorical encoding transformations, visit the open-source package Category encoders.