SklearnTransformerWrapper¶
API Reference¶
- class feature_engine.wrappers.SklearnTransformerWrapper(transformer, variables=None)[source]¶
Wrapper to apply Scikit-learn transformers to a selected group of variables. It works with transformers like the SimpleImputer, OrdinalEncoder, OneHotEncoder, all the scalers and also the transformers for feature selection.
- Parameters
- transformer: sklearn transformer
The desired Scikit-learn transformer.
- variables: list, default=None
The list of variables to be transformed. If None, the wrapper will select all variables of type numeric for all transformers, except the SimpleImputer, OrdinalEncoder and OneHotEncoder, in which case, it will select all variables in the dataset.
Attributes
transformer_:
The fitted Scikit-learn transformer.
variables_:
The group of variables that will be transformed.
n_features_in_:
The number of features in the train set used in fit.
Methods
fit:
Fit Scikit-learn transformer
transform:
Transform data with Scikit-learn transformer
fit_transform:
Fit to data, then transform it.
- fit(X, y=None)[source]¶
Fits the Scikit-learn transformer to the selected variables.
If you enter None in the variables parameter, all variables will be automatically transformed by the OneHotEncoder, OrdinalEncoder or SimpleImputer. For the rest of the transformers, only the numerical variables will be selected and transformed.
If you enter a list in the variables attribute, the SklearnTransformerWrapper will check that those variables exist in the dataframe and are of type numeric for all transformers except the OneHotEncoder, OrdinalEncoder or SimpleImputer, which also accept categorical variables.
- Parameters
- X: Pandas DataFrame
The dataset to fit the transformer
- y: pandas Series, default=None
The target variable.
- Returns
- self
- Raises
- TypeError
If the input is not a Pandas DataFrame
- transform(X)[source]¶
Apply the transformation to the dataframe. Only the selected variables will be modified.
If transformer is the OneHotEncoder, the dummy features will be concatenated to the input dataset. Note that the original categorical variables will not be removed from the dataset after encoding. If this is the desired effect, please use Feature-engine’s OneHotEncoder instead.
- Parameters
- X: Pandas DataFrame
The data to transform
- Returns
- X: Pandas DataFrame
The transformed dataset.
- rtype
DataFrame
..
- Raises
- TypeError
If the input is not a Pandas DataFrame
Example¶
Implements Scikit-learn transformers like the SimpleImputer, the OrdinalEncoder or most scalers only to the selected subset of features.
In the next code snippet we show how to wrap the SimpleImputer from Scikit-learn to impute only the selected variables.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from feature_engine.wrappers import SklearnTransformerWrapper
# Load dataset
data = pd.read_csv('houseprice.csv')
# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'], test_size=0.3, random_state=0)
# set up the wrapper with the SimpleImputer
imputer = SklearnTransformerWrapper(transformer = SimpleImputer(strategy='mean'),
variables = ['LotFrontage', 'MasVnrArea'])
# fit the wrapper + SimpleImputer
imputer.fit(X_train)
# transform the data
X_train = imputer.transform(X_train)
X_test = imputer.transform(X_test)
In the next snippet of code we show how to wrap the StandardScaler from Scikit-learn to standardize only the selected variables.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from feature_engine.wrappers import SklearnTransformerWrapper
# Load dataset
data = pd.read_csv('houseprice.csv')
# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'], test_size=0.3, random_state=0)
# set up the wrapper with the StandardScaler
scaler = SklearnTransformerWrapper(transformer = StandardScaler(),
variables = ['LotFrontage', 'MasVnrArea'])
# fit the wrapper + StandardScaler
scaler.fit(X_train)
# transform the data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
In the next snippet of code we show how to wrap the SelectKBest from Scikit-learn to select only a subset of the variables.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import f_regression, SelectKBest
from feature_engine.wrappers import SklearnTransformerWrapper
# Load dataset
data = pd.read_csv('houseprice.csv')
# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'], test_size=0.3, random_state=0)
cols = [var for var in X_train.columns if X_train[var].dtypes !='O']
# let's apply the standard scaler on the above variables
selector = SklearnTransformerWrapper(
transformer = SelectKBest(f_regression, k=5),
variables = cols)
selector.fit(X_train.fillna(0), y_train)
# transform the data
X_train_t = selector.transform(X_train.fillna(0))
X_test_t = selector.transform(X_test.fillna(0))