DropConstantFeatures#
- class feature_engine.selection.DropConstantFeatures(variables=None, tol=1, missing_values='raise', confirm_variables=False)[source]#
DropConstantFeatures() drops constant and quasi-constant variables from a dataframe. Constant variables show the same value in all the observations in the dataset. Quasi-constant variables show the same value in almost all the observations in the dataset.
This transformer works with numerical and categorical variables. The user can indicate a list of variables to examine. Alternatively, the transformer will evaluate all the variables in the dataset.
The transformer will first identify and store the constant and quasi-constant variables. Next, the transformer will drop these variables from a dataframe.
More details in the User Guide.
- Parameters
- variables: list, default=None
The list of variables to evaluate. If None, the transformer will evaluate all variables in the dataset.
- tol: float,int, default=1
Threshold to detect constant/quasi-constant features. Variables showing the same value in a percentage of observations greater than tol will be considered constant / quasi-constant and dropped. If tol=1, the transformer removes constant variables. Else, it will remove quasi-constant variables. For example, if tol=0.98, the transformer will remove variables that show the same value in 98% of the observations.
- missing_values: str, default=raises
Whether the missing values should be raised as error, ignored or included as an additional value of the variable. Takes values ‘raise’, ‘ignore’, ‘include’.
- confirm_variables: bool, default=False
If set to True, variables that are not present in the input dataframe will be removed from the list of variables. Only used when passing a variable list to the parameter
variables
. See parameter variables for more details.
- Attributes
- features_to_drop_:
List with constant and quasi-constant features.
- variables_:
The variables that will be considered for the feature selection procedure.:
- feature_names_in_:
List with the names of features seen during
fit
.- n_features_in_:
The number of features in the train set used in fit.
See also
sklearn.feature_selection.VarianceThreshold
Notes
This transformer is a similar concept to the VarianceThreshold from Scikit-learn, but it evaluates number of unique values instead of variance.
Examples
>>> import pandas as pd >>> from feature_engine.selection import DropConstantFeatures >>> X = pd.DataFrame(dict(x1 = [1,1,1,1], >>> x2 = ["a", "a", "b", "c"], >>> x3 = [True, False, False, True])) >>> dcf = DropConstantFeatures() >>> dcf.fit_transform(X) x2 x3 0 a True 1 a False 2 b False 3 c True
Additionally, you can set the Threshold for quasi-constant features:
>>> X = pd.DataFrame(dict(x1 = [1,1,1,1], >>> x2 = ["a", "a", "b", "c"], >>> x3 = [True, False, False, False])) >>> dcf = DropConstantFeatures(tol = 0.75) >>> dcf.fit_transform(X) x2 0 a 1 a 2 b 3 c
Methods
fit:
Find constant and quasi-constant features.
fit_transform:
Fit to data, then transform it.
get_feature_names_out:
Get output feature names for transformation.
get_params:
Get parameters for this estimator.
set_params:
Set the parameters of this estimator.
get_support:
Get a mask, or integer index, of the features selected.
transform:
Remove constant and quasi-constant features.
- fit(X, y=None)[source]#
Find constant and quasi-constant features.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The input dataframe.
- y: None
y is not needed for this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation. In other words, returns the variable names of transformed dataframe.
- Parameters
- input_featuresarray or list, default=None
This parameter exits only for compatibility with the Scikit-learn pipeline.
If
None
, thenfeature_names_in_
is used as feature names in.If an array or list, then
input_features
must matchfeature_names_in_
.
- Returns
- feature_names_out: list
Transformed feature names.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- get_support(indices=False)[source]#
Get a mask, or integer index, of the features selected.
- Parameters
- indicesbool, default=False
If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
- supportarray
An index that selects the retained features from a feature vector. If
indices
is False, this is a boolean array of shape [# input features], in which an element is True if its corresponding feature is selected for retention. Ifindices
is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.