BoxCoxTransformer#
- class feature_engine.transformation.BoxCoxTransformer(variables=None)[source]#
The BoxCoxTransformer() applies the BoxCox transformation to numerical variables.
The Box-Cox transformation is defined as:
T(Y)=(Y exp(λ)−1)/λ if λ!=0
log(Y) otherwise
where Y is the response variable and λ is the transformation parameter. λ varies, typically from -5 to 5. In the transformation, all values of λ are considered and the optimal value for a given variable is selected.
The BoxCox transformation implemented by this transformer is that of SciPy.stats: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.boxcox.html
The BoxCoxTransformer() works only with numerical positive variables (>=0).
A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all numerical variables.
More details in the User Guide.
- Parameters
- variables: list, default=None
The list of numerical variables to transform. If None, the transformer will automatically find and select all numerical variables.
- Attributes
- lambda_dict_:
Dictionary with the best BoxCox exponent per variable.
- variables_:
The group of variables that will be transformed.
- feature_names_in_:
List with the names of features seen during
fit
.- n_features_in_:
The number of features in the train set used in fit.
References
- 1
Box and Cox. “An Analysis of Transformations”. Read at a RESEARCH MEETING, 1964. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1964.tb00553.x
Examples
>>> import numpy as np >>> import pandas as pd >>> from feature_engine.transformation import BoxCoxTransformer >>> np.random.seed(42) >>> X = pd.DataFrame(dict(x = np.random.lognormal(size = 100))) >>> bct = BoxCoxTransformer() >>> bct.fit(X) >>> X = bct.transform(X) >>> X.head() x 0 0.505485 1 -0.137595 2 0.662654 3 1.607518 4 -0.232237
Methods
fit:
Learn the optimal lambda for the BoxCox transformation.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
inverse_transform:
Convert the data back to the original representation.
transform:
Apply the BoxCox transformation.
- fit(X, y=None)[source]#
Learn the optimal lambda for the BoxCox transformation.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples. Can be the entire dataframe, not just the variables to transform.
- y: pandas Series, default=None
It is not needed in this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None
, thenfeature_names_in_
is used as feature names in.If an array or list, then
input_features
must matchfeature_names_in_
.
- Returns
- feature_names_out: list
Transformed feature names.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Convert the data back to the original representation.
- Parameters
- X: Pandas DataFrame of shape = [n_samples, n_features]
The data to be inverse transformed.
- Returns
- X_new: pandas dataframe
The dataframe with the original variables.
- rtype
DataFrame
..
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.