Feature-engine: A Python library for Feature Engineering for Machine Learning¶

Feature-engine rocks!¶
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine preserves Scikit-learn functionality with methods fit() and transform() to learn parameters from and then transform the data.
Feature-engine includes transformers for:
Missing data imputation
Categorical variable encoding
Discretisation
Numerical variable transformation
Outlier capping or removal
Variables combination
Variable selection
Feature-engine allows you to select the variables you want to engineer or transform within each transformer. This way, different engineering procedures can be easily applied to different feature subsets.
Feature-engine’s transformers can be assembled within the Scikit-learn pipeline, therefore making it possible to save and deploy one single object (.pkl) with the entire machine learning pipeline. That is, with the entire sequence of transformations to transform your raw data into data that can be fed to machine learning algorithms.
Would you like to know more about what is unique about Feature-engine?
This article provides a nice summary: Feature-engine: A new open source Python package for feature engineering.
Installation¶
Feature-engine is a Python 3 package and works well with 3.6 or later. Earlier versions have not been tested. The simplest way to install Feature-engine is from PyPI with pip, Python’s preferred package installer:
$ pip install feature-engine
Note, you can also install it with a _ as follows:
$ pip install feature_engine
Feature-engine is an active project and routinely publishes new releases with new or updated transformers. In order to upgrade Feature-engine to the latest version, use pip like this:
$ pip install -U feature-engine
If you’re using Anaconda, you can take advantage of the conda utility to install the Anaconda Feature-engine package:
$ conda install -c conda-forge feature_engine
Feature-engine features in the following resources¶
Feature-engine: A new open-source Python package for feature engineering.
Practical Code Implementations of Feature Engineering for Machine Learning with Python.
En Español:
More resources will be added as they appear online!
Contributing¶
Interested in contributing to Feature-engine? That is great news!
Feature-engine is a welcoming and inclusive project and it would be great to have you on board. We follow the Python Software Foundation Code of Conduct.
Regardless of your skill level you can help us. We appreciate bug reports, user testing, feature requests, bug fixes, addition of tests, product enhancements, and documentation improvements.
We also appreciate blogs about Feature-engine. If you happen to have one, let us know!
For more details on how to contribute check the contributing page. Click on the “Contributing” page in the “Table of Contents” on the left of this page.
Thank you for your contributions!
Feature-engine’s Transformers¶
Missing Data Imputation: Imputers¶
MeanMedianImputer: replaces missing data in numerical variables by the mean or median
ArbitraryNumberImputer: replaces missing data in numerical variables by an arbitrary value
EndTailImputer: replaces missing data in numerical variables by numbers at the distribution tails
CategoricalVariableImputer: replaces missing data in categorical variables with the string ‘Missing’ or by the most frequent category
RandomSampleImputer: replaces missing data with random samples of the variable
AddMissingIndicator: adds a binary missing indicator to flag observations with missing data
Categorical Variable Encoders: Encoders¶
OneHotCategoricalEncoder: performs one hot encoding, optional: of popular categories
CountFrequencyCategoricalEncoder: replaces categories by observation count or percentage
OrdinalCategoricalEncoder: replaces categories by numbers arbitrarily or ordered by target
MeanCategoricalEncoder: replaces categories by the target mean
WoERatioCategoricalEncoder: replaces categories by the weight of evidence
DecisionTreeCategoricalEncoder: replaces categories by predictions of a decision tree
RareLabelCategoricalEncoder: groups infrequent categories
Numerical Variable Transformation: Transformers¶
LogTransformer: performs logarithmic transformation of numerical variables
ReciprocalTransformer: performs reciprocal transformation of numerical variables
PowerTransformer: performs power transformation of numerical variables
BoxCoxTransformer: performs Box-Cox transformation of numerical variables
YeoJohnsonTransformer: performs Yeo-Johnson transformation of numerical variables
Variable Discretisation: Discretisers¶
EqualFrequencyDiscretiser: sorts variable into equal frequency intervals
EqualWidthDiscretiser: sorts variable into equal size contiguous intervals
DecisionTreeDiscretiser: uses decision trees to create finite variables
UserInputDiscretiser: allows the user to arbitrarily define the intervals
Outlier Capping or Removal¶
Winsorizer: caps maximum or minimum values using statistical parameters
ArbitraryOutlierCapper: caps maximum and minimum values at user defined values
OutlierTrimmer: removes outliers from the dataset
Scikit-learn Wrapper:¶
SklearnTransformerWrapper: executes Scikit-learn various transformers only on the selected subset of features
Mathematical Combination:¶
MathematicalCombinator: applies basic mathematical operations across features
Feature Selection:¶
DropFeatures: drops a subset of variables from a dataframe
Getting Help¶
Can’t get something to work? Here are places where you can find help.
The docs (you’re here!).
Stack Overflow. If you ask a question, please tag it with “feature-engine”.
If you are enrolled in the Feature Engineering for Machine Learning course in Udemy, post a question in a relevant section.
Join our mailing list.
Ask a question in the repo by filing an issue.
Found a Bug or have a suggestion?¶
Check if there’s already an open issue on the topic. If not, open a new issue with your bug report, suggestion or new feature request.
Open Source¶
Feature-engine’s license is an open source BSD 3-Clause.
Feature-engine is hosted on GitHub. The issues and pull requests are tracked there.