ArbitraryDiscretiser#
- class feature_engine.discretisation.ArbitraryDiscretiser(binning_dict, return_object=False, return_boundaries=False, precision=3, errors='ignore')[source]#
The ArbitraryDiscretiser() divides numerical variables into intervals which limits are determined by the user. Thus, it works only with numerical variables.
You need to enter a dictionary with variable names as keys, and a list with the limits of the intervals as values. For example the key could be the variable name ‘var1’ and the value the following list: [0, 10, 100, 1000]. The ArbitraryDiscretiser() will then sort var1 values into the intervals 0-10, 10-100, 100-1000, and var2 into 5-10, 10-15 and 15-20. Similar to
pandas.cut
.More details in the User Guide.
- Parameters
- binning_dict: dict
The dictionary with the variable to interval limits pairs.
- return_object: bool, default=False
Whether the the discrete variable should be returned as type numeric or type object. If you would like to encode the discrete variables with Feature-engine’s categorical encoders, use True. Alternatively, keep the default to False.
- return_boundaries: bool, default=False
Whether the output should be the interval boundaries. If True, it returns the interval boundaries. If False, it returns integers.
- precision: int, default=3
The precision at which to store and display the bins labels.
- errors: string, default=’ignore’
Indicates what to do when a value is outside the limits indicated in the ‘binning_dict’. If ‘raise’, the transformation will raise an error. If ‘ignore’, values outside the limits are returned as NaN and a warning will be raised instead.
- Attributes
- binner_dict_:
Dictionary with the interval limits per variable.
- variables_:
The group of variables that will be transformed.
- feature_names_in_:
List with the names of features seen during
fit
.- n_features_in_:
The number of features in the train set used in fit.
See also
pandas.cut
Examples
>>> import pandas as pd >>> import numpy as np >>> from feature_engine.discretisation import ArbitraryDiscretiser >>> np.random.seed(42) >>> X = pd.DataFrame(dict(x = np.random.randint(1,100, 100))) >>> bins = dict(x = [0, 25, 50, 75, 100]) >>> ad = ArbitraryDiscretiser(binning_dict = bins) >>> ad.fit(X) >>> ad.transform(X)["x"].value_counts() 2 31 0 27 3 25 1 17 Name: x, dtype: int64
Methods
fit:
This transformer does not learn parameters.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
transform:
Sort continuous variable values into the intervals.
- fit(X, y=None)[source]#
This transformer does not learn any parameter.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training dataset. Can be the entire dataframe, not just the variables to be transformed.
- y: None
y is not needed in this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None
, thenfeature_names_in_
is used as feature names in.If an array or list, then
input_features
must matchfeature_names_in_
.
- Returns
- feature_names_out: list
Transformed feature names.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.