ArbitraryOutlierCapper#
- class feature_engine.outliers.ArbitraryOutlierCapper(max_capping_dict=None, min_capping_dict=None, missing_values='raise')[source]#
The ArbitraryOutlierCapper() caps the maximum or minimum values of a variable at an arbitrary value indicated by the user.
You must provide the maximum or minimum values that will be used to cap each variable in a dictionary containing the features as keys and the capping values as values.
More details in the User Guide.
- Parameters
- max_capping_dict: dictionary, default=None
Dictionary containing the user specified capping values for the right tail of the distribution of each variable to cap (maximum values).
- min_capping_dict: dictionary, default=None
Dictionary containing user specified capping values for the eft tail of the distribution of each variable to cap (minimum values).
- missing_values: string, default=’raise’
Indicates if missing values should be ignored or raised. If
'raise'
the transformer will return an error if the the datasets tofit
ortransform
contain missing values. If'ignore'
, missing data will be ignored when learning parameters or performing the transformation.
- Attributes
- right_tail_caps_:
Dictionary with the maximum values beyond which a value will be considered an outlier.
- left_tail_caps_:
Dictionary with the minimum values beyond which a value will be considered an outlier.
- variables_:
The group of variables that will be transformed.
- feature_names_in_:
List with the names of features seen during
fit
.- n_features_in_:
The number of features in the train set used in fit.
Examples
>>> import pandas as pd >>> from feature_engine.outliers import ArbitraryOutlierCapper >>> X = pd.DataFrame(dict(x1 = [1,2,3,4,5,6,7,8,9,10])) >>> aoc = ArbitraryOutlierCapper(max_capping_dict=dict(x1 = 8), >>> min_capping_dict=dict(x1 = 2)) >>> aoc.fit(X) >>> aoc.transform(X) x1 0 2 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 8 9 8
Methods
fit:
This transformer does not learn parameters.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
transform:
Cap the variables.
- fit(X, y=None)[source]#
This transformer does not learn any parameter.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples.
- y: pandas Series, default=None
y is not needed in this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None
, thenfeature_names_in_
is used as feature names in.If an array or list, then
input_features
must matchfeature_names_in_
.
- Returns
- feature_names_out: list
Transformed feature names.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.