.. _log_transformer:
.. currentmodule:: feature_engine.transformation
LogTransformer
==============
The :class:`LogTransformer()` will apply the logarithm to the indicated variables. Note
that the logarithm can only be applied to positive values. Thus, if the variable contains
0 or negative variables, this transformer will return and error.
**Example**
Let's load the house prices dataset and separate it into train and test sets (more
details about the dataset :ref:`here `).
.. code:: python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from feature_engine import transformation as vt
# Load dataset
data = pd.read_csv('houseprice.csv')
# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'], test_size=0.3, random_state=0)
Now we want to apply the logarithm to 2 of the variables in the dataset using the
:class:`LogTransformer()`.
.. code:: python
# set up the variable transformer
tf = vt.LogTransformer(variables = ['LotArea', 'GrLivArea'])
# fit the transformer
tf.fit(X_train)
With `fit()`, this transformer does not learn any parameters. We can go ahead not an
transform the variables.
.. code:: python
# transform the data
train_t= tf.transform(X_train)
test_t= tf.transform(X_test)
Next, we make a histogram of the original variable distribution:
.. code:: python
# un-transformed variable
X_train['LotArea'].hist(bins=50)
.. image:: ../../images/lotarearaw.png
And now, we can explore the distribution of the variable after the logarithm transformation:
.. code:: python
# transformed variable
train_t['LotArea'].hist(bins=50)
.. image:: ../../images/lotarealog.png
Note that the transformed variable has a more Gaussian looking distribution.
More details
^^^^^^^^^^^^
You can find more details about the :class:`LogTransformer()` here:
- `Jupyter notebook `_
For more details about this and other feature engineering methods check out these resources:
- `Feature engineering for machine learning `_, online course.
- `Python Feature Engineering Cookbook `_, book.