MathematicalCombination#
- class feature_engine.creation.MathematicalCombination(variables_to_combine, math_operations=None, new_variables_names=None, missing_values='raise', drop_original=False)[source]#
MathematicalCombination() applies basic mathematical operations to multiple features, returning one or more additional features as a result. That is, it sums, multiplies, takes the average, maximum, minimum or standard deviation of a group of variables, and returns the result into new variables.
Note that if some of the variables to combine have missing data and you set
missing_values='ignore'
, the value will be ignored in the computation. To be clear, if variables A, B and C, have values 10, 20 and NA, and we perform the sum, the result will be A + B = 30.More details in the User Guide.
- Parameters
- variables_to_combine: list
The list of numerical variables to combine.
- math_operations: list, default=None
The list of basic math operations to be used to create the new features.
If None, all of [‘sum’, ‘prod’, ‘mean’, ‘std’, ‘max’, ‘min’] will be performed. Alternatively, you can enter the list of operations to carry out. Each operation should be a string and must be one of the elements in
['sum', 'prod', 'mean', 'std', 'max', 'min']
.Each operation will result in a new variable that will be added to the transformed dataset.
- new_variables_names: list, default=None
Names of the new variables. If passing a list with the names for the new features (recommended), you must enter one name for each mathematical transformation indicated in the
math_operations
parameter. The name of the new variables should coincide with the order in which the mathematical operations are initialised in the transformer.If
new_variable_names = None
, the transformer will assign an arbitrary name to the newly created features starting by the name of the mathematical operation, followed by the variables combined separated by -.- missing_values: string, default=’raise’
Indicates if missing values should be ignored or raised. If ‘raise’ the transformer will return an error if the the datasets to
fit
ortransform
contain missing values. If ‘ignore’, missing data will be ignored when performing the calculations.
- Attributes
- combination_dict_:
Dictionary containing the mathematical operation to new variable name pairs.
- math_operations_:
List with the mathematical operations to be applied to the
variables_to_combine
.- n_features_in_:
The number of features in the train set used in fit.
Notes
Although the transformer in essence allows us to combine any feature with any of the allowed mathematical operations, its used is intended mostly for the creation of new features based on some domain knowledge. Typical examples within the financial sector are:
Sum debt across financial products, i.e., credit cards, to obtain the total debt.
Take the average payments to various financial products per month.
Find the Minimum payment done at any one month.
In insurance, we can sum the damage to various parts of a car to obtain the total damage.
Methods
fit:
This transformer does not learn parameters.
transform:
Combine the variables with the mathematical operations.
fit_transform:
Fit to the data, then transform it.
- fit(X, y=None)[source]#
This transformer does not learn parameters.
Perform dataframe checks. Creates dictionary of operation to new feature name pairs.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples. Can be the entire dataframe, not just the variables to transform.
- y: pandas Series, or np.array. Defaults to None.
It is not needed in this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Combine the variables with the mathematical operations.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The data to transform.
- Returns
- X_new: Pandas dataframe, shape = [n_samples, n_features + n_operations]
The dataframe with the original variables plus the new variables.
- :rtype:py:class:
~pandas.core.frame.DataFrame