ReciprocalTransformer#

A reciprocal transformation involves replacing each data value x, with its reciprocal, 1/x. This transformation is useful for addressing heteroscedasticity, where the variability of errors in a regression model differs across values of an independent variable, and for transforming skewed distributions into more symmetric ones. It can also linearize certain nonlinear relationships, making them easier to model with linear regression, and improve the overall fit of a linear model by reducing the influence of outliers or normalizing residuals.

Applications#

The reciprocal transformation is useful for ratios, where the values of a variable result from the division of two v ariables. Some examples include variables like student-teacher ratio (students per teacher) or crop yield (tons per acre).

By calculating the inverse of these variables, we shift from representing students per teacher to teachers per student, or from tons per acre to acres per ton. This transformation still makes intuitive sense and can result in a better spread of values, that follow closer a normal distribution.

Properties#

Reciprocal transformation of x is 1 / x
The inverse of the reciprocal transformation is also the reciprocal transformation
The range of the reciprocal function includes all real numbers except 0

Although in theory, the reciprocal function is defined for both positive and negative values, in practice, it’s mostly used to transform strictly positive variables.

ReciprocalTransformer#

The ReciprocalTransformer applies the reciprocal transformation to numerical variables. By default, it will find and transform all numerical variables in the dataset. A better practice would be to apply the transformer to a selected group of variables, which you can do by passing a list with the variable names to the variables parameter when setting up the transformer.

If any of the variables contains 0 as value, the transformer will raise an error.

Python examples#

In the next sections, we’ll demonstrate how to apply the reciprocal transformation with ReciprocalTransformer.

We’ll load the Ames house prices dataset and create a new variable that represents the square foots per car in the house garage. Next, we’ll separate the data into train and test sets:

import matplotlib.pyplot as plt

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from feature_engine.transformation import ReciprocalTransformer

data = fetch_openml(name='house_prices', as_frame=True)
data = data.frame

data["sqrfootpercar"] = data['GarageArea'] / data['GarageCars']
data = data[~data["sqrfootpercar"].isna()]

y = data['SalePrice']
X = data[['GarageCars', 'GarageArea', "sqrfootpercar"]]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

print(X_train.head())

In the following output we see the resulting dataset:

      GarageCars  GarageArea  sqrfootpercar
         1         358          358.0
          1         352          352.0
          1         264          264.0
          2         540          270.0
         2         528          264.0

Let’s plot the distribution of the variable with the square foot area per car in a garage:

X_train["sqrfootpercar"].hist(bins=50, figsize=(4,4))
plt.title("sqrfootpercar")
plt.show()

In the following image we can see the skewness of the variable:

../../_images/reciprocal_transfomer_original.png

Let’s now apply the reciprocal transformation to this variable:

tf = ReciprocalTransformer(variables="sqrfootpercar")

train_t = tf.fit_transform(X_train)
test_t = tf.transform(X_test)

Finally, let’s plot the distribution after the reciprocal transformation:

train_t["sqrfootpercar"].hist(bins=50, figsize=(4,4))
plt.title("sqrfootpercar")
plt.show()

In the following image, we see that the reciprocal transformation made the variable’s values follow more closer a symmetric or normal distribution:

../../_images/reciprocal_transfomer_new.png

Inverse transformation#

With ReciprocalTransformer, we can easily revert the transformed data to it’s original representation, by using the method inverse_transform:

train_unt = tf.inverse_transform(train_t)
test_unt = tf.inverse_transform(test_t)

Let’s check out the reverted transformation:

train_unt["sqrfootpercar"].hist(bins=50, figsize=(4,4))
plt.title("sqrfootpercar")
plt.show()

As you can see in the following image, we obtained the original data by re-applying the reciprocal function to the transformed variable:

../../_images/reciprocal_transfomer_inverse.png

Pipeline of transformations#

As we mentioned previously, the reciprocal transformation is suitable, in general for ratio variables, so we need to transform other variables in the data set with other type of transformations.

Let’s not plot the distribution of the 3 variables in the original data to see which transformations could be suitable for them:

X_train.hist(bins=50, figsize=(10,10))
plt.show()

In the following plot, we can see that, as expected, GarageCounts contains counts (potentially following a Poisson distribution), and GarageArea is a continuous variable:

../../_images/reciprocal_transformer_3plots_original.png

Let’s then create a pipeline to apply the square root transformation to GarageCounts and the Box-Cox transformation to GarageArea, while applying the reciprocal transformation to “sqrfootpercar”:

from feature_engine.pipeline import Pipeline
from feature_engine.transformation import PowerTransformer, BoxCoxTransformer

from feature_engine.pipeline import Pipeline
from feature_engine.transformation import PowerTransformer, BoxCoxTransformer

pipe = Pipeline([
    ("reciprocal", ReciprocalTransformer(variables="sqrfootpercar")),
    ("sqrroot", PowerTransformer(variables="GarageCars", exp=1/2)),
    ("boxcox", BoxCoxTransformer(variables="GarageArea")),
])

Let’s now fit the pipeline and transform the datasets:

train_t = pipe.fit_transform(X_train)
test_t = pipe.transform(X_test)

And now, we can corroborate how these transformations improved the value spread across all variables by plotting the histograms for the transformed data:

train_t.hist(bins=50, figsize=(10,10))
plt.show()

In the following image, we can see that the variables no longer show the right-skewness, and now their values are more symmetrically distributed across their value ranges:

../../_images/reciprocal_transformer_3plots_new.png

An that’s it! We’ve now applied different mathematical functions to stabilize the variance of the variables in the dataset.

Alternatives to the reciprocal function#

We mentioned that the reciprocal function is used, in practice, with positive values. If the variable contains negative values, the Yeo-Johnson transformation, or adding a constant followed by the Box-Cox transformation might be better choices.

If the variable does not come from ratios, then, the log transform or the arcsine transformation can be employed to handle these cases.

If the variable contains counts, then the square root transformation is better suited.

The Box-Cox transformation automates the process of finding the best transformation by exploring several functions automatically.

All these functions are considered variance stabilizing transformations, and have been designed to transform data, to meet the assumptions of statistical parametric tests and linear regression models.

You can apply all these functions out-of-the-box with the transformers from Feature-engine’s transformation module. Remember to follow up the transformations with proper data analysis, to ensure that the transformations returned the desired effect, otherwise, we are adding complexity to the feature engineering pipeline for now added benefit.

Alternatives with Feature-engine#

You can apply other variance data transformation functions with the following transformers:

LogTransformer: applies logarithmic transformation
ArcsinTransformer: applies arcsin transformation
PowerTransformer: applies power transformation including sqrt
BoxCoxTransformer: applies the Box-Cox transformation
YeoJohnsonTransformer: applies the Yeo-Johnson transformation

Additional resources#

You can find more details about the ReciprocalTransformer() here:

Jupyter notebook

For more details about this and other feature engineering methods check out these resources:

Feature Engineering for Machine Learning#

Or read our book:

Python Feature Engineering Cookbook#

Both our book and course are suitable for beginners and more advanced data scientists alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.

Boost Your Data Science Skills

ReciprocalTransformer#

Applications#

Properties#

ReciprocalTransformer#

Python examples#

Inverse transformation#

Pipeline of transformations#

Alternatives to the reciprocal function#

Alternatives with Feature-engine#

Additional resources#