.. _mean_encoder:
.. currentmodule:: feature_engine.encoding
MeanEncoder
===========
Mean encoding is the process of replacing the categories in categorical features by the
mean value of the target variable shown by each category. For example, if we are trying
to predict the default rate (that's the target variable), and our dataset has the categorical
variable **City**, with the categories of **London**, **Manchester**, and **Bristol**,
and the default rate per city is 0.1, 0.5, and 0.3, respectively, with mean encoding, we
would replace London by 0.1, Manchester by 0.5, and Bristol by 0.3.
Mean encoding, together with one hot encoding and ordinal encoding, belongs to the most
commonly used categorical encoding techniques in data science.
It is said that mean encoding can easily cause overfitting. That's because we are capturing
some information about the target into the predictive features during the encoding. More
importantly, the overfitting can be caused by encoding categories with low frequencies
with mean target values that are unreliable. In short, the mean target values seen for
those categories in the training set do not hold for test data or new observations.
Overfitting

When the categories in the categorical features have a good representation, or, in other
words, when there are enough observations in our dataset that show the categories that we
want to encode, then taking the simple average of the target variable per category is a
good approximation. We can trust that a new data point, say from the test data, that
shows that category will also have a target value that is similar to the target mean
value that we calculated for said category during training.
However, if there are only a few observations that show some of the categories, then the
mean target value for those categories will be unreliable. In other words, the certainty
that we have that a new observation that shows this category will have a mean target value
close to the one we estimated decreases.
To account for the uncertainty of the encoding values for rare categories, what we normally
do is **"blend"** the mean target variable per category with the general mean of the target,
calculated over the entire training dataset. And this blending is proportional to the
variability of the target within that category and the category frequency.
Smoothing

To avoid overfitting, we can determine the mean target value estimates as a mixture of two
values: the mean target value per category (known as the posterior) and the mean target
value in the entire dataset (known as the prior).
The following formula shows the estimation of the mean target value with smoothing:
.. math::
mapping = (w_i) posterior + (1w_i) prior
The prior and posterior values are “blended” using a weighting factor (`wi`). This weighting
factor is a function of the category group size (`n_i`) and the variance of the target in
the data (`t`) and within the category (`s`):
.. math::
w_i = n_i t / (s + n_i t)
When the category group is large, the weighing factor is close to 1, and therefore more
weight is given to the posterior (the mean of the target per category). When the category
group size is small, then the weight gets closer to 0, and more weight is given to the
prior (the mean of the target in the entire dataset).
In addition, if the variability of the target within that category is large, we also give
more weight to the prior, whereas if it is small, then we give more weight to the posterior.
In short, adding smoothing can help prevent overfitting in those cases where categorical
data have many infrequent categories or show high cardinality.
High cardinality

High cardinality refers to a high number of unique categories in the categorical features.
Mean encoding was specifically designed to tackle highly cardinal variables by taking
advantage of this smoothing function, which will essentially blend infrequent categories
together by replacing them with values very close to the overall target mean calculated
over the training data.
Another encoding method that tackles cardinality out of the box is count encoding. See for
example :class:`CountFrequencyEncoder`.
To account for highly cardinal variables in alternative encoding methods, you can group
rare categories together by using the :class:`RareLabelEncoder`.
Alternative Python implementations of mean encoding

In Featureengine, we blend the probabilities considering the target variability and the
category frequency. In the original paper, there are alternative formulations to determine
the blending. If you want to check those out, use the transformers from the Python library
Category encoders:
 `Mestimate `_
 `Target Encoder `_
Mean encoder

Featureengine's :class:`MeanEncoder()` replaces categories with the mean of the target per
category. By default, it does not implement smoothing. That means that it will replace
categories by the mean target value as determined during training over the training data
set (just the posterior).
To apply smoothing using the formulation that we described earlier, set the parameter
`smoothing` to `"auto"`. That would be our recommended solution. Alternatively, you can
set the parameter `smoothing` to any value that you want, in which case the weighting
factor `wi` will be calculated like this:
.. math::
w_i = n_i / (s + n_i)
where s is the value your pass to `smoothing`.
Unseen categories

Unseen categories are those labels that were not seen during training. Or in other words,
categories that were not present in the training data.
With the :class:`MeanEncoder()`, we can take care of unseen categories in 1 of 3 ways:
 We can set the mean encoder to ignore unseen categories, in which case those categories will be replaced by nan.
 We can set the mean encoder to raise an error when it encounters unseen categories. This is useful when we don't expect new categories for those categorical variables.
 We can instruct the mean encoder to replace unseen or new categories with the mean of the target shown in the training data, that is, the prior.
Mean encoding and machine learning

Featureengine's :class:`MeanEncoder()` can perform mean encoding for regression and binary
classification datasets. At the moment, we do not support multiclass targets.
Python examples

In the following sections, we'll show the functionality of :class:`MeanEncoder()` using the
Titanic Dataset.
First, let's load the libraries, functions and classes:
.. code:: python
from sklearn.model_selection import train_test_split
from feature_engine.datasets import load_titanic
from feature_engine.encoding import MeanEncoder
To avoid data leakage, it is important to separate the data into training and test sets.
The mean target values, with or without smoothing, will be determined using the training
data only.
Let's load and split the data:
.. code:: python
X, y = load_titanic(
return_X_y_frame=True,
handle_missing=True,
predictors_only=True,
cabin="letter_only",
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=0,
)
print(X_train.head())
We see the resulting dataframe containing 3 categorical columns: sex, cabin and embarked:
.. code:: python
pclass sex age sibsp parch fare cabin embarked
501 2 female 13.000000 0 1 19.5000 M S
588 2 female 4.000000 1 1 23.0000 M S
402 2 female 30.000000 1 0 13.8583 M C
1193 3 male 29.881135 0 0 7.7250 M Q
686 3 female 22.000000 0 0 7.7250 M Q
Simple mean encoding

Let's set up the :class:`MeanEncoder()` to replace the categories in the categorical
features with the target mean, without smoothing:
.. code:: python
encoder = MeanEncoder(
variables=['cabin', 'sex', 'embarked'],
)
encoder.fit(X_train, y_train)
With `fit()` the encoder learns the target mean value for each category and stores those
values in the `encoder_dict_` attribute:
.. code:: python
encoder.encoder_dict_
The `encoder_dict_` contains the mean value of the target per category, per variable.
We can use this dictionary to map the numbers in the encoded features to the original
categorical values.
.. code:: python
{'cabin': {'A': 0.5294117647058824,
'B': 0.7619047619047619,
'C': 0.5633802816901409,
'D': 0.71875,
'E': 0.71875,
'F': 0.6666666666666666,
'G': 0.5,
'M': 0.30484330484330485,
'T': 0.0},
'sex': {'female': 0.7283582089552239, 'male': 0.18760757314974183},
'embarked': {'C': 0.553072625698324,
'Missing': 1.0,
'Q': 0.37349397590361444,
'S': 0.3389570552147239}}
We can now go ahead and replace the categorical values with the numerical values:
.. code:: python
train_t = encoder.transform(X_train)
test_t = encoder.transform(X_test)
print(train_t.head())
Below we see the resulting dataframe, where the categorical values are now replaced
with the target mean values:
.. code:: python
pclass sex age sibsp parch fare cabin embarked
501 2 0.728358 13.000000 0 1 19.5000 0.304843 0.338957
588 2 0.728358 4.000000 1 1 23.0000 0.304843 0.338957
402 2 0.728358 30.000000 1 0 13.8583 0.304843 0.553073
1193 3 0.187608 29.881135 0 0 7.7250 0.304843 0.373494
686 3 0.728358 22.000000 0 0 7.7250 0.304843 0.373494
Mean encoding with smoothing

By default, :class:`MeanEncoder()` determines the mean target values without blending.
If we want to apply smoothing to control the cardinality of the variable and avoid
overfitting, we set up the transformer as follows:
.. code:: python
encoder = MeanEncoder(
variables=None,
smoothing="auto"
)
encoder.fit(X_train, y_train)
In this example, we did not indicate which variables to encode. :class:`MeanEncoder()` can
automatically find the categorical variables, which are stored in one of its attributes:
.. code:: python
encoder.variables_
Below we see the categorical features found by :class:`MeanEncoder()`:
.. code:: python
['sex', 'cabin', 'embarked']
We can find the categorical mappings calculated by the mean encoder:
.. code:: python
encoder.encoder_dict_
Note that these values are different to those determined without smoothing:
.. code:: python
{'sex': {'female': 0.7275051072923914, 'male': 0.18782635616273297},
'cabin': {'A': 0.5210189753697639,
'B': 0.755161569137655,
'C': 0.5608140829162441,
'D': 0.7100896537503179,
'E': 0.7100896537503179,
'F': 0.6501082490288561,
'G': 0.47606795923242295,
'M': 0.3049458046855866,
'T': 0.0},
'embarked': {'C': 0.552100581239763,
'Missing': 1.0,
'Q': 0.3736336816011083,
'S': 0.3390242994568531}}
We can now go ahead and replace the categorical values with the numerical values:
.. code:: python
train_t = encoder.transform(X_train)
test_t = encoder.transform(X_test)
print(train_t.head())
Below we see the resulting dataframe with the encoded features:
.. code:: python
pclass sex age sibsp parch fare cabin embarked
501 2 0.727505 13.000000 0 1 19.5000 0.304946 0.339024
588 2 0.727505 4.000000 1 1 23.0000 0.304946 0.339024
402 2 0.727505 30.000000 1 0 13.8583 0.304946 0.552101
1193 3 0.187826 29.881135 0 0 7.7250 0.304946 0.373634
686 3 0.727505 22.000000 0 0 7.7250 0.304946 0.373634
We can now use this dataframes to train machine learning models for regression or
classification.
Mean encoding variables with numerical values

:class:`MeanEncoder()`, and all Featureengine encoders, have been designed to work with
variables of type object or categorical by default. If you want to encode variables that
are numeric, you need to instruct the transformer to ignore the data type:
.. code:: python
encoder = MeanEncoder(
variables=['cabin', 'pclass'],
ignore_format=True,
)
t_train = encoder.fit_transform(X_train, y_train)
t_test = encoder.transform(X_test)
After encoding the features we can use the data sets to train machine learning algorithms.
Last thing to note before closing in is that mean encoding does not increase the
dimensionality of the resulting dataframes: from 1 categorical feature, we obtain 1
encoded variable. Hence, this encoding method is suitable for predictive modeling that
uses models that are sensitive to the size of the feature space.
Additional resources

In the following notebook, you can find more details into the :class:`MeanEncoder()`
functionality and example plots with the encoded variables:
 `Jupyter notebook `_
For tutorials about this and other feature engineering methods check out these resources:
.. figure:: ../../images/feml.png
:width: 300
:figclass: aligncenter
:align: left
:target: https://www.trainindata.com/p/featureengineeringformachinelearning
Feature Engineering for Machine Learning
.. figure:: ../../images/fetsf.png
:width: 300
:figclass: aligncenter
:align: right
:target: https://www.trainindata.com/p/featureengineeringforforecasting
Feature Engineering for Time Series Forecasting










Or read our book:
.. figure:: ../../images/cookbook.png
:width: 200
:figclass: aligncenter
:align: left
:target: https://packt.link/0ewSo
Python Feature Engineering Cookbook













Both our book and courses are suitable for beginners and more advanced data scientists
alike. By purchasing them you are supporting Sole, the main developer of Featureengine.