Feature-engine: A Python library for Feature Engineering for Machine Learning

_images/FeatureEngine.png

Feature-engine rocks!

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine preserves Scikit-learn functionality with methods fit() and transform() to learn parameters from and then transform the data.

Feature-engine includes transformers for:

  • Missing data imputation

  • Categorical variable encoding

  • Discretisation

  • Variable transformation

  • Outlier capping or removal

  • Variable creation

  • Variable selection

Feature-engine allows you to select the variables you want to transform within each transformer. This way, different engineering procedures can be easily applied to different feature subsets.

Feature-engine transformers can be assembled within the Scikit-learn pipeline, therefore making it possible to save and deploy one single object (.pkl) with the entire machine learning pipeline. That is, one object with the entire sequence of variable transformations to leave the raw data ready to be consumed by a machine learning algorithm, and the machine learning model at the back. Check the quickstart for an example.

Would you like to know more about what is unique about Feature-engine?

This article provides a nice summary:

Installation

Feature-engine is a Python 3 package and works well with 3.6 or later. Earlier versions have not been tested. The simplest way to install Feature-engine is from PyPI with pip:

$ pip install feature-engine

Note, you can also install it with a _ as follows:

$ pip install feature_engine

Feature-engine is an active project and routinely publishes new releases. To upgrade Feature-engine to the latest version, use pip like this:

$ pip install -U feature-engine

If you’re using Anaconda, you can install the Anaconda Feature-engine package:

$ conda install -c conda-forge feature_engine

Feature-engine’s Transformers

Missing Data Imputation: Imputers

  • MeanMedianImputer: replaces missing data in numerical variables by the mean or median

  • ArbitraryNumberImputer: replaces missing data in numerical variables by an arbitrary value

  • EndTailImputer: replaces missing data in numerical variables by numbers at the distribution tails

  • CategoricalImputer: replaces missing data in categorical variables with the string ‘Missing’ or by the most frequent category

  • RandomSampleImputer: replaces missing data with random samples of the variable

  • AddMissingIndicator: adds a binary missing indicator to flag observations with missing data

  • DropMissingData: removes rows containing NA values from dataframe

Categorical Variable Encoders: Encoders

Numerical Variable Transformation: Transformers

Variable Discretisation: Discretisers

Outlier Capping or Removal

Scikit-learn Wrapper:

Mathematical Combination:

Feature Selection:

Getting Help

Can’t get something to work? Here are places where you can find help.

  1. The docs (you’re here!).

  2. Stack Overflow. If you ask a question, please tag it with “feature-engine”.

  3. If you are enrolled in the Feature Engineering for Machine Learning course in Udemy , post a question in a relevant section.

  4. Join our mailing list.

  5. Ask a question in the repo by filing an issue.

Found a Bug or have a suggestion?

Check if there’s already an open issue on the topic. If not, open a new issue with your bug report, suggestion or new feature request.

Contributing

Interested in contributing to Feature-engine? That is great news!

Feature-engine is a welcoming and inclusive project and it would be great to have you on board. We follow the Python Software Foundation Code of Conduct.

Regardless of your skill level you can help us. We appreciate bug reports, user testing, feature requests, bug fixes, addition of tests, product enhancements, and documentation improvements. We also appreciate blogs about Feature-engine. If you happen to have one, let us know!

For more details on how to contribute check the contributing page. Click on the “Contributing” link on the left of this page.