MeanMedianImputer#
- class feature_engine.imputation.MeanMedianImputer(imputation_method='median', variables=None)[source]#
The MeanMedianImputer() replaces missing data by the mean or median value of the variable. It works only with numerical variables.
You can pass a list of variables to impute. Alternatively, the MeanMedianImputer() will automatically select all variables of type numeric in the training set.
More details in the User Guide.
- Parameters
- imputation_method: str, default=’median’
Desired method of imputation. Can take ‘mean’ or ‘median’.
- variables: list, default=None
The list of numerical variables to transform. If None, the transformer will automatically find and select all numerical variables.
- Attributes
- imputer_dict_:
Dictionary with the values to replace missing data in each variable.
- variables_:
The group of variables that will be transformed.
- feature_names_in_:
List with the names of features seen during
fit
.- n_features_in_:
The number of features in the train set used in fit.
Examples
>>> import pandas as pd >>> import numpy as np >>> from feature_engine.imputation import MeanMedianImputer >>> X = pd.DataFrame(dict( >>> x1 = [np.nan,1,1,0,np.nan], >>> x2 = ["a", np.nan, "b", np.nan, "a"], >>> )) >>> mmi = MeanMedianImputer(imputation_method='median') >>> mmi.fit(X) >>> mmi.transform(X) x1 x2 0 1.0 a 1 1.0 NaN 2 1.0 b 3 0.0 NaN 4 1.0 a
Methods
fit:
Learn the mean or median values.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
transform:
Impute missing data.
- fit(X, y=None)[source]#
Learn the mean or median values.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training dataset.
- y: pandas series or None, default=None
y is not needed in this imputation. You can pass None or y.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None
, thenfeature_names_in_
is used as feature names in.If an array or list, then
input_features
must matchfeature_names_in_
.
- Returns
- feature_names_out: list
Transformed feature names.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Replace missing data with the learned parameters.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The data to be transformed.
- Returns
- X_new: pandas dataframe of shape = [n_samples, n_features]
The dataframe without missing values in the selected variables.
- rtype
DataFrame
..