ArcSinhTransformer#
- class feature_engine.transformation.ArcSinhTransformer(variables=None, loc=0.0, scale=1.0)[source]#
The ArcSinhTransformer() applies the inverse hyperbolic sine transformation (arcsinh) to numerical variables. Also known as the pseudo-logarithm, this transformation is useful for data that contains both positive and negative values.
The transformation is: x → arcsinh((x - loc) / scale)
For large values of x, arcsinh(x) behaves like ln(x) + ln(2), providing similar variance-stabilizing properties as the log transformation. For small values of x, it behaves approximately linearly (i.e., arcsinh(x) ≈ x). This makes it ideal for variables like net worth, profit/loss, or any metric that can be positive or negative.
A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all variables of type numeric.
More details in the User Guide.
- Parameters
- variables: list, default=None
The list of numerical variables to transform. If None, the transformer will automatically find and select all numerical variables.
- loc: float, default=0.0
Location parameter for shifting the data before transformation. The transformation becomes: arcsinh((x - loc) / scale)
- scale: float, default=1.0
Scale parameter for normalizing the data before transformation. Must be greater than 0. The transformation becomes: arcsinh((x - loc) / scale)
- Attributes
- variables_:
The group of variables that will be transformed.
- feature_names_in_:
List with the names of features seen during
fit.- n_features_in_:
The number of features in the train set used in fit.
See also
feature_engine.transformation.LogTransformerApplies log transformation (only for positive values).
feature_engine.transformation.YeoJohnsonTransformerApplies Yeo-Johnson transformation.
References
- 1
Burbidge, J. B., Magee, L., & Robb, A. L. (1988). Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association, 83(401), 123-127.
Examples
>>> import numpy as np >>> import pandas as pd >>> from feature_engine.transformation import ArcSinhTransformer >>> np.random.seed(42) >>> X = pd.DataFrame(dict(x = np.random.randn(100) * 1000)) >>> ast = ArcSinhTransformer() >>> ast.fit(X) >>> X = ast.transform(X) >>> X.head() x 0 7.516076 1 -6.330816 2 7.780254 3 8.825252 4 -6.995893
Methods
fit:
This transformer does not learn parameters.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
inverse_transform:
Convert the data back to the original representation.
transform:
Transform the variables using the arcsinh function.
- fit(X, y=None)[source]#
Selects the numerical variables and stores feature names.
- Parameters
- X: Pandas DataFrame of shape = [n_samples, n_features].
The training input samples. Can be the entire dataframe, not just the variables to transform.
- y: pandas Series, default=None
It is not needed in this transformer. You can pass y or None.
- Returns
- self: ArcSinhTransformer
The fitted transformer.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters. Pass only if the estimator accepts additional params in its
fitmethod.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None, thenfeature_names_in_is used as feature names in.If an array or list, then
input_featuresmust matchfeature_names_in_.
- Returns
- feature_names_out: list
Transformed feature names.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Convert the data back to the original representation.
- Parameters
- X: Pandas DataFrame of shape = [n_samples, n_features]
The data to be inverse transformed.
- Returns
- X_tr: pandas dataframe
The dataframe with the inverse transformed variables.
- rtype
DataFrame..
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.