CyclicalTransformer

API Reference

class feature_engine.creation.CyclicalTransformer(variables=None, max_values=None, drop_original=False)[source]

The CyclicalTransformer() applies cyclical transformations to numerical variables. The transformations returns 2 new features per variable, according to:

  • var_sin = sin(variable * (2. * pi / max_value))

  • var_cos = cos(variable * (2. * pi / max_value))

where max_value is the maximum value in the variable, and pi is 3.14…

Motivation: There are some features that are cyclic by nature. For example the hours of a day or the months in a year. In these cases, the higher values of the variable are closer to the lower values. For example, December (12) is closer to January (1) than to June (6). By applying a cyclical transformation we capture this cycle or proximity between values.

The CyclicalTransformer() works only with numerical variables. Missing data should be imputed before applying this transformer.

A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all numerical variables.

Parameters
variables: list, default=None

The list of numerical variables to transform. If None, the transformer will automatically find and select all numerical variables.

max_values: dict, default=None

A dictionary with the maximum value of each variable to transform. Useful when the maximum value is not present in the dataset. If None, the transformer will automatically find the maximum value of each variable.

drop_original: bool, default=False

If True, the original variables to transform will be dropped from the dataframe.

Attributes

max_values_:

The maximum value of the cyclical feature.

variables_:

The group of variables that will be transformed.

n_features_in_:

The number of features in the train set used in fit.

References

http://blog.davidkaleko.com/feature-engineering-cyclical-features.html

Methods

fit:

Learns the maximum values of the cyclical features.

transform:

Applies the cyclical transformation, creates 2 new features.

fit_transform:

Fit to data, then transform it.

fit(X, y=None)[source]

Learns the maximum value of each of the cyclical variables.

Parameters
X: pandas dataframe of shape = [n_samples, n_features]

The training input samples. Can be the entire dataframe, not just the variables to transform.

y: pandas Series, default=None

It is not needed in this transformer. You can pass y or None.

Returns
self
Raises
TypeError

If the input is not a Pandas DataFrame.

ValueError:
  • If some of the columns contains NaNs.

  • If some of the mapping keys are not present in variables.

transform(X)[source]

Creates new features using the cyclical transformation.

Parameters
X: Pandas DataFrame of shame = [n_samples, n_features]

The data to be transformed.

Returns
X: Pandas dataframe.

The dataframe with the additional new features. The original variables will be dropped if drop_originals is False, or retained otherwise.

Raises
TypeError

If the input is not Pandas DataFrame.

Example

import pandas as pd
from sklearn.model_selection import train_test_split

from feature_engine.creation import CyclicalTransformer

df = pd.DataFrame({
    'day': [6, 7, 5, 3, 1, 2, 4],
    'months': [3, 7, 9, 12, 4, 6, 12],
    })

cyclical = CyclicalTransformer(variables=None, drop_original=True)

X = cyclical.fit_transform(df)
print(cyclical.max_values_)
{'day': 7, 'months': 12}
print(X.head())
      day_sin     day_cos  months_sin  months_cos
1    -0.78183         0.62349             1.0             0.0
2         0.0             1.0            -0.5        -0.86603
3    -0.97493       -0.222521            -1.0            -0.0
4     0.43388       -0.900969             0.0             1.0
5     0.78183         0.62349         0.86603            -0.5
6     0.97493       -0.222521             0.0            -1.0
7    -0.43388       -0.900969             0.0             1.0