SklearnTransformerWrapper#

class feature_engine.wrappers.SklearnTransformerWrapper(transformer, variables=None)[source]#

Wrapper to apply Scikit-learn transformers to a selected group of variables. It supports the following transformers:

Binarizer and KBinsDiscretizer (only when encoding=Ordinal)
FunctionTransformer, PowerTransformer and QuantileTransformer
SimpleImputer, IterativeImputer and KNNImputer (only when add_indicators=False)
OrdinalEncoder and OneHotEncoder (only when sparse is False)
MaxAbsScaler, MinMaxScaler, StandardScaler, RobustScaler, Normalizer
All selection transformers including VarianceThreshold
PolynomialFeautures

More details in the User Guide.

Parameters

transformer: sklearn transformer: The desired Scikit-learn transformer.
variables: list, default=None: The list of variables to be transformed. If None, the wrapper will select all variables of type numeric for all transformers, except the SimpleImputer, OrdinalEncoder and OneHotEncoder, in which case, it will select all variables in the dataset.

Attributes

transformer_:: The fitted Scikit-learn transformer.
variables_:: The group of variables that will be transformed.
features_to_drop_:: The variables that will be dropped. Only present when using selection transformers
feature_names_in_:: List with the names of features seen during fit.
n_features_in_:: The number of features in the train set used in fit.

See also

sklearn.compose.ColumnTransformer

Notes

This transformer offers similar functionality to the ColumnTransformer from Scikit-learn, but it allows entering the transformations directly into a Pipeline and returns pandas dataframes.

Examples

>>> import pandas as pd
>>> from feature_engine.wrappers import SklearnTransformerWrapper
>>> from sklearn.preprocessing import StandardScaler
>>> X = pd.DataFrame(dict(x1 = ["a","b","c"], x2 = [1,2,3], x3 = [4,5,6]))
>>> skw = SklearnTransformerWrapper(StandardScaler())
>>> skw.fit(X)
>>> skw.transform(X)
  x1        x2        x3
0  a -1.224745 -1.224745
1  b  0.000000  0.000000
2  c  1.224745  1.224745

>>> import pandas as pd
>>> from feature_engine.wrappers import SklearnTransformerWrapper
>>> from sklearn.preprocessing import OneHotEncoder
>>> X = pd.DataFrame(dict(x1 = ["a","b","c"], x2 = [1,2,3], x3 = [4,5,6]))
>>> skw = SklearnTransformerWrapper(
>>>     OneHotEncoder(sparse_output = False), variables = "x1")
>>> skw.fit(X)
>>> skw.transform(X)
   x2  x3  x1_a  x1_b  x1_c
0   1   4   1.0   0.0   0.0
1   2   5   0.0   1.0   0.0
2   3   6   0.0   0.0   1.0

>>> import pandas as pd
>>> from feature_engine.wrappers import SklearnTransformerWrapper
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = pd.DataFrame(dict(x1 = ["a","b","c"], x2 = [1,2,3], x3 = [4,5,6]))
>>> skw = SklearnTransformerWrapper(PolynomialFeatures(include_bias = False))
>>> skw.fit(X)
>>> skw.transform(X)
  x1   x2   x3  x2^2  x2 x3  x3^2
0  a  1.0  4.0   1.0    4.0  16.0
1  b  2.0  5.0   4.0   10.0  25.0
2  c  3.0  6.0   9.0   18.0  36.0

Methods

fit:	Fit Scikit-learn transformer.
fit_transform:	Fit to data, then transform it.
get_feature_names_out:	Get output feature names for transformation.
get_params:	Get parameters for this estimator.
set_params:	Set the parameters of this estimator.
inverse_transform:	Convert the data back to the original representation.
transform:	Transform data with the Scikit-learn transformer.

fit(X, y=None)[source]#

Fits the Scikit-learn transformer to the selected variables.

Parameters

X: Pandas DataFrame: The dataset to fit the transformer.
y: pandas Series, default=None: The target variable.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

input_features: list, default=None: If None, then the names of all the variables in the transformed dataset is returned. For those transformers that create and add new features to the dataset, like the OneHotEncoder or the PolynomialFeatures, you have the option to pass a list with the input features to obtain the newly created variables. For all other transformers, this parameter will be ignored.

Returns

feature_names_out: list: The feature names.

rtype: List ..

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

inverse_transform(X)[source]#

Convert the transformed variables back to the original values. Only implemented for the following Scikit-learn transformers:

PowerTransformer, QuantileTransformer, OrdinalEncoder, MaxAbsScaler, MinMaxScaler, StandardScaler, RobustScaler.

If you would like this method implemented for additional transformers, please check if they have the inverse_transform method in Scikit-learn and then raise an issue in our repo.

Parameters

X: pandas dataframe of shape = [n_samples, n_features].: The transformed dataframe.

Returns

X_tr: pandas dataframe of shape = [n_samples, n_features].: The dataframe with the original values.

rtype: DataFrame ..

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

transform(X)[source]#

Apply the transformation to the dataframe. Only the selected variables will be modified.

If the Scikit-learn transformer is the OneHotEncoder or the PolynomialFeatures, the new features will be concatenated to the input dataset.

If the Scikit-learn transformer is for feature selection, the non-selected features will be dropped from the dataframe.

For all other transformers, the original variables will be replaced by the transformed ones.

Parameters

X: Pandas DataFrame: The data to transform.

Returns

X_new: Pandas DataFrame: The transformed dataset.

rtype: DataFrame ..

Boost Your Data Science Skills

SklearnTransformerWrapper#