.. _drop_features: .. currentmodule:: feature_engine.selection DropFeatures ============= The :class:`DropFeatures()` drops a list of variables indicated by the user from the original dataframe. The user can pass a single variable as a string or list of variables to be dropped. :class:`DropFeatures()` offers similar functionality to `pandas.dataframe.drop `_, but the difference is that :class:`DropFeatures()` can be integrated into a Scikit-learn pipeline. **When is this transformer useful?** Sometimes, we create new variables combining other variables in the dataset, for example, we obtain the variable `age` by subtracting `date_of_application` from `date_of_birth`. After we obtained our new variable, we do not need the date variables in the dataset any more. Thus, we can add :class:`DropFeatures()` in the Pipeline to have these removed. **Example** Let's see how to use :class:`DropFeatures()` in an example with the Titanic dataset. We first load the data and separate it into train and test: .. code:: python from sklearn.model_selection import train_test_split from feature_engine.datasets import load_titanic from feature_engine.selection import DropFeatures X, y = load_titanic( return_X_y_frame=True, handle_missing=True, ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=0, ) print(X_train.head()) Now, we go ahead and print the dataset column names: .. code:: python X_train.columns .. code:: python Index(['pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'embarked', 'boat', 'body', 'home.dest'], dtype='object') Now, with :class:`DropFeatures()` we can very easily drop a group of variables. Below we set up the transformer to drop a list of 6 variables: .. code:: python # set up the transformer transformer = DropFeatures( features_to_drop=['sibsp', 'parch', 'ticket', 'fare', 'body', 'home.dest'] ) # fit the transformer transformer.fit(X_train) With `fit()` this transformer does not learn any parameter. We can go ahead and remove the variables as follows: .. code:: python train_t = transformer.transform(X_train) test_t = transformer.transform(X_test) And now, if we print the variable names of the transformed dataset, we see that it has been reduced: .. code:: python train_t.columns .. code:: python Index(['pclass', 'name', 'sex', 'age', 'cabin', 'embarked', 'boat'], dtype='object') Additional resources -------------------- In this Kaggle kernel we feature 3 different end-to-end machine learning pipelines using :class:`DropFeatures()`: - `Kaggle Kernel `_ All notebooks can be found in a `dedicated repository `_. For more details about this and other feature selection methods check out these resources: .. figure:: ../../images/fsml.png :width: 300 :figclass: align-center :align: left :target: https://www.trainindata.com/p/feature-selection-for-machine-learning Feature Selection for Machine Learning | | | | | | | | | | Or read our book: .. figure:: ../../images/fsmlbook.png :width: 200 :figclass: align-center :align: left :target: https://leanpub.com/feature-selection-in-machine-learning Feature Selection in Machine Learning | | | | | | | | | | | | | | Both our book and course are suitable for beginners and more advanced data scientists alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.