Version 1.1.X#

Version 1.1.2#

Deployed: 31th August 2021

Contributors#

  • Soledad Galli

This small release fixes a Bug in how the OneHotEncoder handles binary categorical variables when the parameter drop_last_binary is set to True. It also ensures that the values in the OneHotEncoder.encoder_dict_ are lists of categories and not arrays. These bugs were introduced in v1.1.0.

Bug fix#

  • OneHotEncoder: drop_last_binary now outputs 1 dummy variable per binary variable when set to true

Version 1.1.1#

Deployed: 6th August 2021

Contributors#

  • Miguel Trema Marrufo

  • Nicolas Galli

  • Soledad Galli

In this release, we add a new transformer, expand the functionality of 2 other transformers and migrate the repo to its own organisation!

Mayor changes#

New transformer#

  • LogCpTransformer: applies the logarithm transformation after adding a constant (Miguel Trema Marrufo)

Minor changes#

  • Expands functionality of DropCorrelatedFeatures and SmartCorrelationSelectionFeature to accept callables as a correlation function (Miguel Trema Marrufo)

  • Adds inverse_transform to all transformers from the transformation module (Nicolas Galli).

Documentation#

Version 1.1.0#

Deployed: 22st June 2021

Contributors#

  • Hector Patino

  • Andrew Tan

  • Shubhmay Potdar

  • Agustin Firpo

  • Indy Navarro Vidal

  • Ashok Kumar

  • Chris Samiullah

  • Soledad Galli

In this release, we enforce compatibility with Scikit-learn by adding the check_estimator tests to all transformers in the package.

In order to pass the tests, we needed to modify some of the internal functionality of Feature-engine transformers and create new attributes. We tried not to break backwards compatibility as much as possible.

Mayor changes#

  • Most transformers have now the additional attribute variables_ containing the variables that will be modified. The former attribute variables is retained. variables_ will almost always be identical to variables except when the transformer is initialised with variables=None.

  • The parameter transformer in the SklearnTransformerWrapper and the parameter estimator in the SelectBySingleFeaturePerformance, SelectByShuffling, RecursiveFeatureElimination and RecursiveFeatureAddition need a compulsory entry, and cannot be left blank when initialising the transformers.

  • Categorical encoders support now variables cast as category as well as object (Shubhmay Potdar and Soledad Galli)

  • Categorical encoders have now the parameter ignore_format to allow the transformer to work with any variable type, and not just object or categorical.

  • CategoricalImputer has now the parameter ignore_format to allow the transformer to work with any variable type, and not just object or categorical.

  • All transformers have now the new attribute n_features_in with captures the number of features in the dataset used to train the transformer (during fit()).

Minor changes#

  • Feature selection transformers support now all cross-validation schemes in the cv parameter, and not just an integer. That is, you can initialize the transformer with LOOCV, or StratifiedCV for example.

  • The OneHotEncoder includes additional functionality to return just 1 dummy variable for categorical variables that contain only 2 categories. In the new attribute variables_binary_ you can identify the original binary variables.

  • MathematicalCombinator now supports use of dataframes with null values (Agustin Firpo).

New transformer#

  • CyclicalTransformer: applies a cyclical transformation to numerical variables (Hector Patino)

Code improvement#

  • Tests from check_estimator added to all transformers

  • Test for compatibility with Python 3.9 added to circleCI (Chris Samiullah and Soledad Galli)

  • Automatic black8 and linting added to tox

  • Additional code fixes (Andrew Tan and Indy Navarro Vidal).

Documentation#

  • Additional comparison tables for imputers and encoders.

  • Updates Readme with new badges and resources.

  • Expanded SklearnWrapper demos in Jupyter notebooks.

  • Expanded outlier transformer demos in Jupyter notebooks (Ashok Kumar)

  • Expanded Pipeline demos in Jupyter notebooks.

Community#

  • Created Gitter community to support users and foster knowledge exchange

Version 1.0.2#

Deployed: 22th January 2021

Contributors#

  • Nicolas Galli

  • Pradumna Suryawanshi

  • Elamraoui Sohayb

  • Soledad Galli

New transformers#

  • CombineWithReferenceFeatures: applies mathematical operations between a group of variables and reference variables (by Nicolas Galli)

  • DropMissingData: removes missing observations from a dataset (Pradumna Suryawanshi)

Bug Fix#

  • Fix bugs in SelectByTargetMeanPerformance.

  • Fix documentation and jupyter notebook typos.

Tutorials#

  • Creation: updated “how to” examples on how to combine variables into new features (by Elamraoui Sohayb and Nicolas Galli)

  • Kaggle Kernels: include links to Kaggle kernels

Version 1.0.1#

Deployed: 11th January 2021

Bug Fix#

  • Fix use of r2 in SelectBySingleFeaturePerformance and SelectByTargetMeanPerformance.

  • Fix documentation not showing properly in readthedocs.

Version 1.0.0#

Deployed: 31st December 2020

Contributors#

  • Ashok Kumar

  • Christopher Samiullah

  • Nicolas Galli

  • Nodar Okroshiashvili

  • Pradumna Suryawanshi

  • Sana Ben Driss

  • Tejash Shah

  • Tung Lee

  • Soledad Galli

In this version, we made a major overhaul of the package, with code quality improvement throughout the code base, unification of attributes and methods, addition of new transformers and extended documentation. Read below for more details.

New transformers for Feature Selection#

We included a whole new module with multiple transformers to select features.

  • DropConstantFeatures: removes constant and quasi-constant features from a dataframe (by Tejash Shah)

  • DropDuplicateFeatures: removes duplicated features from a dataset (by Tejash Shah and Soledad Galli)

  • DropCorrelatedFeatures: removes features that are correlated (by Nicolas Galli)

  • SmartCorrelationSelection: selects feature from group of correlated features based on certain criteria (by Soledad Galli)

  • ShuffleFeaturesSelector: selects features by drop in machine learning model performance after feature’s values are randomly shuffled (by Sana Ben Driss)

  • SelectBySingleFeaturePerformance: selects features based on a ML model performance trained on individual features (by Nicolas Galli)

  • SelectByTargetMeanPerformance: selects features encoding the categories or intervals with the target mean and using that as proxy for performance (by Tung Lee and Soledad Galli)

  • RecursiveFeatureElimination: selects features recursively, evaluating the drop in ML performance, from the least to the most important feature (by Sana Ben Driss)

  • RecursiveFeatureAddition: selects features recursively, evaluating the increase in ML performance, from the most to the least important feature (by Sana Ben Driss)

Renaming of Modules#

Feature-engine transformers have been sorted into submodules to smooth the development of the package and shorten import syntax for users.

  • Module imputation: missing data imputers are now imported from feature_engine.imputation instead of feature_engine.missing_data_imputation.

  • Module encoding: categorical variable encoders are now imported from feature_engine.encoding instead of feature_engine_categorical_encoders.

  • Module discretisation: discretisation transformers are now imported from feature_engine.discretisation instead of feature_engine.discretisers.

  • Module transformation: transformers are now imported from feature_engine.transformation instead of feature_engine.variable_transformers.

  • Module outliers: transformers to remove or censor outliers are now imported from feature_engine.outliers instead of feature_engine.outlier_removers.

  • Module selection: new module hosts transformers to select or remove variables from a dataset.

  • Module creation: new module hosts transformers that combine variables into new features using mathematical or other operations.

Renaming of Classes#

We shortened the name of categorical encoders, and also renamed other classes to simplify import syntax.

  • Encoders: the word Categorical was removed from the classes name. Now, instead of MeanCategoricalEncoder, the class is called MeanEncoder. Instead of RareLabelCategoricalEncoder it is RareLabelEncoder and so on. Please check the encoders documentation for more details.

  • Imputers: the CategoricalVariableImputer is now called CategoricalImputer.

  • Discretisers: the UserInputDiscretiser is now called ArbitraryDiscretiser.

  • Creation: the MathematicalCombinator is not called MathematicalCombination.

  • WoEEncoder and PRatioEncoder: the WoEEncoder now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: the PRatioEncoder (by Nicolas Galli).

Renaming of Parameters#

We renamed a few parameters to unify the nomenclature across the Package.

  • EndTailImputer: the parameter distribution is now called imputation_method to unify convention among imputers. To impute using the IQR, we now need to pass imputation_method="iqr" instead of imputation_method="skewed".

  • AddMissingIndicator: the parameter missing_only now takes the boolean values True or False.

  • Winzoriser and OutlierTrimmer: the parameter distribution is now called capping_method to unify names across Feature-engine transformers.

Tutorials#

  • Imputation: updated “how to” examples of missing data imputation (by Pradumna Suryawanshi)

  • Encoders: new and updated “how to” examples of categorical encoding (by Ashok Kumar)

  • Discretisation: new and updated “how to” examples of discretisation (by Ashok Kumar)

  • Variable transformation: updated “how to” examples on how to apply mathematical transformations to variables (by Pradumna Suryawanshi)

For Contributors and Developers#

Code Architecture#

  • Submodules: transformers have been grouped within relevant submodules and modules.

  • Individual tests: testing classes have been subdivided into individual tests

  • Code Style: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code.

  • Type hint: we rolled out the use of type hint throughout classes and functions (by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah)

Documentation#

  • Switched fully to numpydoc and away from Napoleon

  • Included more detail about methods, parameters, returns and raises, as per numpydoc docstring style (by Nodar Okroshiashvili, Soledad Galli)

  • Linked documentation to github repository

  • Improved layout

Other Changes#

  • Updated documentation: documentation reflects the current use of Feature-engine transformers

  • Typo fixes: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)