Version 1.1.X#
Version 1.1.2#
Deployed: 31th August 2021
Contributors#
Soledad Galli
This small release fixes a Bug in how the OneHotEncoder handles binary categorical variables
when the parameter drop_last_binary
is set to True. It also ensures that the values in the
OneHotEncoder.encoder_dict_
are lists of categories and not arrays. These bugs were
introduced in v1.1.0.
Bug fix#
OneHotEncoder: drop_last_binary now outputs 1 dummy variable per binary variable when set to true
Version 1.1.1#
Deployed: 6th August 2021
Contributors#
Miguel Trema Marrufo
Nicolas Galli
Soledad Galli
In this release, we add a new transformer, expand the functionality of 2 other transformers and migrate the repo to its own organisation!
Mayor changes#
Feature-engine is now hosted in its own Github organisation
New transformer#
LogCpTransformer: applies the logarithm transformation after adding a constant (Miguel Trema Marrufo)
Minor changes#
Expands functionality of
DropCorrelatedFeatures
andSmartCorrelationSelectionFeature
to accept callables as a correlation function (Miguel Trema Marrufo)Adds
inverse_transform
to all transformers from the transformation module (Nicolas Galli).
Documentation#
Migrates main repo to Feature-engine’s Github organisation
Migrates example jupyter notebooks to separate repo
Adds Roadmap
Version 1.1.0#
Deployed: 22st June 2021
Contributors#
Hector Patino
Andrew Tan
Shubhmay Potdar
Agustin Firpo
Indy Navarro Vidal
Ashok Kumar
Chris Samiullah
Soledad Galli
In this release, we enforce compatibility with Scikit-learn by adding the check_estimator tests to all transformers in the package.
In order to pass the tests, we needed to modify some of the internal functionality of Feature-engine transformers and create new attributes. We tried not to break backwards compatibility as much as possible.
Mayor changes#
Most transformers have now the additional attribute
variables_
containing the variables that will be modified. The former attributevariables
is retained.variables_
will almost always be identical tovariables
except when the transformer is initialised withvariables=None
.The parameter
transformer
in the SklearnTransformerWrapper and the parameterestimator
in the SelectBySingleFeaturePerformance, SelectByShuffling, RecursiveFeatureElimination and RecursiveFeatureAddition need a compulsory entry, and cannot be left blank when initialising the transformers.Categorical encoders support now variables cast as
category
as well asobject
(Shubhmay Potdar and Soledad Galli)Categorical encoders have now the parameter
ignore_format
to allow the transformer to work with any variable type, and not just object or categorical.
CategoricalImputer
has now the parameterignore_format
to allow the transformer to work with any variable type, and not just object or categorical.All transformers have now the new attribute
n_features_in
with captures the number of features in the dataset used to train the transformer (during fit()).
Minor changes#
Feature selection transformers support now all cross-validation schemes in the
cv
parameter, and not just an integer. That is, you can initialize the transformer with LOOCV, or StratifiedCV for example.The OneHotEncoder includes additional functionality to return just 1 dummy variable for categorical variables that contain only 2 categories. In the new attribute
variables_binary_
you can identify the original binary variables.MathematicalCombinator now supports use of dataframes with null values (Agustin Firpo).
New transformer#
CyclicalTransformer: applies a cyclical transformation to numerical variables (Hector Patino)
Code improvement#
Tests from check_estimator added to all transformers
Test for compatibility with Python 3.9 added to circleCI (Chris Samiullah and Soledad Galli)
Automatic black8 and linting added to tox
Additional code fixes (Andrew Tan and Indy Navarro Vidal).
Documentation#
Additional comparison tables for imputers and encoders.
Updates Readme with new badges and resources.
Expanded SklearnWrapper demos in Jupyter notebooks.
Expanded outlier transformer demos in Jupyter notebooks (Ashok Kumar)
Expanded Pipeline demos in Jupyter notebooks.
Community#
Created Gitter community to support users and foster knowledge exchange
Version 1.0.2#
Deployed: 22th January 2021
Contributors#
Nicolas Galli
Pradumna Suryawanshi
Elamraoui Sohayb
Soledad Galli
New transformers#
CombineWithReferenceFeatures: applies mathematical operations between a group of variables and reference variables (by Nicolas Galli)
DropMissingData: removes missing observations from a dataset (Pradumna Suryawanshi)
Bug Fix#
Fix bugs in SelectByTargetMeanPerformance.
Fix documentation and jupyter notebook typos.
Tutorials#
Creation: updated “how to” examples on how to combine variables into new features (by Elamraoui Sohayb and Nicolas Galli)
Kaggle Kernels: include links to Kaggle kernels
Version 1.0.1#
Deployed: 11th January 2021
Bug Fix#
Fix use of r2 in SelectBySingleFeaturePerformance and SelectByTargetMeanPerformance.
Fix documentation not showing properly in readthedocs.
Version 1.0.0#
Deployed: 31st December 2020
Contributors#
Ashok Kumar
Christopher Samiullah
Nicolas Galli
Nodar Okroshiashvili
Pradumna Suryawanshi
Sana Ben Driss
Tejash Shah
Tung Lee
Soledad Galli
In this version, we made a major overhaul of the package, with code quality improvement throughout the code base, unification of attributes and methods, addition of new transformers and extended documentation. Read below for more details.
New transformers for Feature Selection#
We included a whole new module with multiple transformers to select features.
DropConstantFeatures: removes constant and quasi-constant features from a dataframe (by Tejash Shah)
DropDuplicateFeatures: removes duplicated features from a dataset (by Tejash Shah and Soledad Galli)
DropCorrelatedFeatures: removes features that are correlated (by Nicolas Galli)
SmartCorrelationSelection: selects feature from group of correlated features based on certain criteria (by Soledad Galli)
ShuffleFeaturesSelector: selects features by drop in machine learning model performance after feature’s values are randomly shuffled (by Sana Ben Driss)
SelectBySingleFeaturePerformance: selects features based on a ML model performance trained on individual features (by Nicolas Galli)
SelectByTargetMeanPerformance: selects features encoding the categories or intervals with the target mean and using that as proxy for performance (by Tung Lee and Soledad Galli)
RecursiveFeatureElimination: selects features recursively, evaluating the drop in ML performance, from the least to the most important feature (by Sana Ben Driss)
RecursiveFeatureAddition: selects features recursively, evaluating the increase in ML performance, from the most to the least important feature (by Sana Ben Driss)
Renaming of Modules#
Feature-engine transformers have been sorted into submodules to smooth the development of the package and shorten import syntax for users.
Module imputation: missing data imputers are now imported from
feature_engine.imputation
instead offeature_engine.missing_data_imputation
.Module encoding: categorical variable encoders are now imported from
feature_engine.encoding
instead offeature_engine_categorical_encoders
.Module discretisation: discretisation transformers are now imported from
feature_engine.discretisation
instead offeature_engine.discretisers
.Module transformation: transformers are now imported from
feature_engine.transformation
instead offeature_engine.variable_transformers
.Module outliers: transformers to remove or censor outliers are now imported from
feature_engine.outliers
instead offeature_engine.outlier_removers
.Module selection: new module hosts transformers to select or remove variables from a dataset.
Module creation: new module hosts transformers that combine variables into new features using mathematical or other operations.
Renaming of Classes#
We shortened the name of categorical encoders, and also renamed other classes to simplify import syntax.
Encoders: the word
Categorical
was removed from the classes name. Now, instead ofMeanCategoricalEncoder
, the class is calledMeanEncoder
. Instead ofRareLabelCategoricalEncoder
it isRareLabelEncoder
and so on. Please check the encoders documentation for more details.Imputers: the
CategoricalVariableImputer
is now calledCategoricalImputer
.Discretisers: the
UserInputDiscretiser
is now calledArbitraryDiscretiser
.Creation: the
MathematicalCombinator
is not calledMathematicalCombination
.WoEEncoder and PRatioEncoder: the
WoEEncoder
now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: thePRatioEncoder
(by Nicolas Galli).
Renaming of Parameters#
We renamed a few parameters to unify the nomenclature across the Package.
EndTailImputer: the parameter
distribution
is now calledimputation_method
to unify convention among imputers. To impute using the IQR, we now need to passimputation_method="iqr"
instead ofimputation_method="skewed"
.AddMissingIndicator: the parameter
missing_only
now takes the boolean valuesTrue
orFalse
.Winzoriser and OutlierTrimmer: the parameter
distribution
is now calledcapping_method
to unify names across Feature-engine transformers.
Tutorials#
Imputation: updated “how to” examples of missing data imputation (by Pradumna Suryawanshi)
Encoders: new and updated “how to” examples of categorical encoding (by Ashok Kumar)
Discretisation: new and updated “how to” examples of discretisation (by Ashok Kumar)
Variable transformation: updated “how to” examples on how to apply mathematical transformations to variables (by Pradumna Suryawanshi)
For Contributors and Developers#
Code Architecture#
Submodules: transformers have been grouped within relevant submodules and modules.
Individual tests: testing classes have been subdivided into individual tests
Code Style: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code.
Type hint: we rolled out the use of type hint throughout classes and functions (by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah)
Documentation#
Switched fully to numpydoc and away from Napoleon
Included more detail about methods, parameters, returns and raises, as per numpydoc docstring style (by Nodar Okroshiashvili, Soledad Galli)
Linked documentation to github repository
Improved layout
Other Changes#
Updated documentation: documentation reflects the current use of Feature-engine transformers
Typo fixes: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)