DropFeatures#
The DropFeatures()
drops a list of variables indicated by the user from the original
dataframe. The user can pass a single variable as a string or list of variables to be
dropped.
DropFeatures()
offers similar functionality to pandas.dataframe.drop,
but the difference is that DropFeatures()
can be integrated into a Scikit-learn
pipeline.
When is this transformer useful?
Sometimes, we create new variables combining other variables in the dataset, for
example, we obtain the variable age
by subtracting date_of_application
from
date_of_birth
. After we obtained our new variable, we do not need the date
variables in the dataset any more. Thus, we can add DropFeatures()
in the Pipeline
to have these removed.
Example
Let’s see how to use DropFeatures()
in an example with the Titanic dataset. We
first load the data and separate it into train and test:
from sklearn.model_selection import train_test_split
from feature_engine.datasets import load_titanic
from feature_engine.selection import DropFeatures
X, y = load_titanic(
return_X_y_frame=True,
handle_missing=True,
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=0,
)
print(X_train.head())
Now, we go ahead and print the dataset column names:
X_train.columns
Index(['pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare',
'cabin', 'embarked', 'boat', 'body', 'home.dest'],
dtype='object')
Now, with DropFeatures()
we can very easily drop a group of variables. Below
we set up the transformer to drop a list of 6 variables:
# set up the transformer
transformer = DropFeatures(
features_to_drop=['sibsp', 'parch', 'ticket', 'fare', 'body', 'home.dest']
)
# fit the transformer
transformer.fit(X_train)
With fit()
this transformer does not learn any parameter. We can go ahead and remove
the variables as follows:
train_t = transformer.transform(X_train)
test_t = transformer.transform(X_test)
And now, if we print the variable names of the transformed dataset, we see that it has been reduced:
train_t.columns
Index(['pclass', 'name', 'sex', 'age', 'cabin', 'embarked', 'boat'], dtype='object')
Additional resources#
In this Kaggle kernel we feature 3 different end-to-end machine learning pipelines using
DropFeatures()
:
All notebooks can be found in a dedicated repository.
For more details about this and other feature selection methods check out these resources:

Feature Selection for Machine Learning#
Or read our book:

Feature Selection in Machine Learning#
Both our book and course are suitable for beginners and more advanced data scientists alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.