Version 1.1.X ============= Version 1.1.2 ------------- Deployed: 31th August 2021 Contributors ~~~~~~~~~~~~ - Soledad Galli This small release fixes a Bug in how the OneHotEncoder handles binary categorical variables when the parameter `drop_last_binary` is set to True. It also ensures that the values in the `OneHotEncoder.encoder_dict_` are lists of categories and not arrays. These bugs were introduced in v1.1.0. Bug fix ~~~~~~~ - **OneHotEncoder**: drop_last_binary now outputs 1 dummy variable per binary variable when set to true Version 1.1.1 ------------- Deployed: 6th August 2021 Contributors ~~~~~~~~~~~~ - Miguel Trema Marrufo - Nicolas Galli - Soledad Galli In this release, we add a new transformer, expand the functionality of 2 other transformers and migrate the repo to its own organisation! Mayor changes ~~~~~~~~~~~~~ - Feature-engine is now hosted in its `own Github organisation `_ New transformer ~~~~~~~~~~~~~~~ - **LogCpTransformer**: applies the logarithm transformation after adding a constant (**Miguel Trema Marrufo**) Minor changes ~~~~~~~~~~~~~ - Expands functionality of `DropCorrelatedFeatures` and `SmartCorrelationSelectionFeature` to accept callables as a correlation function (**Miguel Trema Marrufo**) - Adds `inverse_transform` to all transformers from the transformation module (**Nicolas Galli**). Documentation ~~~~~~~~~~~~~ - Migrates main repo to `Feature-engine's Github organisation `_ - Migrates example jupyter notebooks to `separate repo `_ - Adds Roadmap Version 1.1.0 ------------- Deployed: 22st June 2021 Contributors ~~~~~~~~~~~~ - Hector Patino - Andrew Tan - Shubhmay Potdar - Agustin Firpo - Indy Navarro Vidal - Ashok Kumar - Chris Samiullah - Soledad Galli In this release, we enforce compatibility with Scikit-learn by adding the `check_estimator `_ tests to **all transformers** in the package. In order to pass the tests, we needed to modify some of the internal functionality of Feature-engine transformers and create new attributes. We tried not to break backwards compatibility as much as possible. Mayor changes ~~~~~~~~~~~~~ - Most transformers have now the additional attribute `variables_` containing the variables that will be modified. The former attribute `variables` is retained. `variables_` will almost always be identical to `variables` except when the transformer is initialised with `variables=None`. - The parameter `transformer` in the SklearnTransformerWrapper and the parameter `estimator` in the SelectBySingleFeaturePerformance, SelectByShuffling, RecursiveFeatureElimination and RecursiveFeatureAddition need a compulsory entry, and cannot be left blank when initialising the transformers. - Categorical encoders support now variables cast as `category` as well as `object` (**Shubhmay Potdar and Soledad Galli**) - Categorical encoders have now the parameter `ignore_format` to allow the transformer to work with any variable type, and not just object or categorical. - `CategoricalImputer` has now the parameter `ignore_format` to allow the transformer to work with any variable type, and not just object or categorical. - All transformers have now the new attribute `n_features_in` with captures the number of features in the dataset used to train the transformer (during fit()). Minor changes ~~~~~~~~~~~~~ - Feature selection transformers support now all cross-validation schemes in the `cv` parameter, and not just an integer. That is, you can initialize the transformer with LOOCV, or StratifiedCV for example. - The OneHotEncoder includes additional functionality to return just 1 dummy variable for categorical variables that contain only 2 categories. In the new attribute `variables_binary_` you can identify the original binary variables. - MathematicalCombinator now supports use of dataframes with null values (**Agustin Firpo**). New transformer ~~~~~~~~~~~~~~~ - **CyclicalTransformer**: applies a cyclical transformation to numerical variables (**Hector Patino**) Code improvement ~~~~~~~~~~~~~~~~ - Tests from check_estimator added to all transformers - Test for compatibility with Python 3.9 added to circleCI (**Chris Samiullah and Soledad Galli**) - Automatic black8 and linting added to tox - Additional code fixes (**Andrew Tan and Indy Navarro Vidal**). Documentation ~~~~~~~~~~~~~ - Additional comparison tables for imputers and encoders. - Updates Readme with new badges and resources. - Expanded SklearnWrapper demos in Jupyter notebooks. - Expanded outlier transformer demos in Jupyter notebooks (**Ashok Kumar**) - Expanded Pipeline demos in Jupyter notebooks. Community ~~~~~~~~~ - Created Gitter community to support users and foster knowledge exchange Version 1.0.2 ------------- Deployed: 22th January 2021 Contributors ~~~~~~~~~~~~ - Nicolas Galli - Pradumna Suryawanshi - Elamraoui Sohayb - Soledad Galli New transformers ~~~~~~~~~~~~~~~~ - **CombineWithReferenceFeatures**: applies mathematical operations between a group of variables and reference variables (**by Nicolas Galli**) - **DropMissingData**: removes missing observations from a dataset (**Pradumna Suryawanshi**) Bug Fix ~~~~~~~ - Fix bugs in SelectByTargetMeanPerformance. - Fix documentation and jupyter notebook typos. Tutorials ~~~~~~~~~ - **Creation**: updated "how to" examples on how to combine variables into new features (**by Elamraoui Sohayb and Nicolas Galli**) - **Kaggle Kernels**: include links to Kaggle kernels Version 1.0.1 ------------- Deployed: 11th January 2021 Bug Fix ~~~~~~~ - Fix use of r2 in SelectBySingleFeaturePerformance and SelectByTargetMeanPerformance. - Fix documentation not showing properly in readthedocs. Version 1.0.0 ------------- Deployed: 31st December 2020 Contributors ~~~~~~~~~~~~ - Ashok Kumar - Christopher Samiullah - Nicolas Galli - Nodar Okroshiashvili - Pradumna Suryawanshi - Sana Ben Driss - Tejash Shah - Tung Lee - Soledad Galli In this version, we made a major overhaul of the package, with code quality improvement throughout the code base, unification of attributes and methods, addition of new transformers and extended documentation. Read below for more details. New transformers for Feature Selection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We included a whole new module with multiple transformers to select features. - **DropConstantFeatures**: removes constant and quasi-constant features from a dataframe (**by Tejash Shah**) - **DropDuplicateFeatures**: removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**) - **DropCorrelatedFeatures**: removes features that are correlated (**by Nicolas Galli**) - **SmartCorrelationSelection**: selects feature from group of correlated features based on certain criteria (**by Soledad Galli**) - **ShuffleFeaturesSelector**: selects features by drop in machine learning model performance after feature's values are randomly shuffled (**by Sana Ben Driss**) - **SelectBySingleFeaturePerformance**: selects features based on a ML model performance trained on individual features (**by Nicolas Galli**) - **SelectByTargetMeanPerformance**: selects features encoding the categories or intervals with the target mean and using that as proxy for performance (**by Tung Lee and Soledad Galli**) - **RecursiveFeatureElimination**: selects features recursively, evaluating the drop in ML performance, from the least to the most important feature (**by Sana Ben Driss**) - **RecursiveFeatureAddition**: selects features recursively, evaluating the increase in ML performance, from the most to the least important feature (**by Sana Ben Driss**) Renaming of Modules ~~~~~~~~~~~~~~~~~~~ Feature-engine transformers have been sorted into submodules to smooth the development of the package and shorten import syntax for users. - **Module imputation**: missing data imputers are now imported from ``feature_engine.imputation`` instead of ``feature_engine.missing_data_imputation``. - **Module encoding**: categorical variable encoders are now imported from ``feature_engine.encoding`` instead of ``feature_engine_categorical_encoders``. - **Module discretisation**: discretisation transformers are now imported from ``feature_engine.discretisation`` instead of ``feature_engine.discretisers``. - **Module transformation**: transformers are now imported from ``feature_engine.transformation`` instead of ``feature_engine.variable_transformers``. - **Module outliers**: transformers to remove or censor outliers are now imported from ``feature_engine.outliers`` instead of ``feature_engine.outlier_removers``. - **Module selection**: new module hosts transformers to select or remove variables from a dataset. - **Module creation**: new module hosts transformers that combine variables into new features using mathematical or other operations. Renaming of Classes ~~~~~~~~~~~~~~~~~~~ We shortened the name of categorical encoders, and also renamed other classes to simplify import syntax. - **Encoders**: the word ``Categorical`` was removed from the classes name. Now, instead of ``MeanCategoricalEncoder``, the class is called ``MeanEncoder``. Instead of ``RareLabelCategoricalEncoder`` it is ``RareLabelEncoder`` and so on. Please check the encoders documentation for more details. - **Imputers**: the ``CategoricalVariableImputer`` is now called ``CategoricalImputer``. - **Discretisers**: the ``UserInputDiscretiser`` is now called ``ArbitraryDiscretiser``. - **Creation**: the ``MathematicalCombinator`` is not called ``MathematicalCombination``. - **WoEEncoder and PRatioEncoder**: the ``WoEEncoder`` now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: the ``PRatioEncoder`` (**by Nicolas Galli**). Renaming of Parameters ~~~~~~~~~~~~~~~~~~~~~~ We renamed a few parameters to unify the nomenclature across the Package. - **EndTailImputer**: the parameter ``distribution`` is now called ``imputation_method`` to unify convention among imputers. To impute using the IQR, we now need to pass ``imputation_method="iqr"`` instead of ``imputation_method="skewed"``. - **AddMissingIndicator**: the parameter ``missing_only`` now takes the boolean values ``True`` or ``False``. - **Winzoriser and OutlierTrimmer**: the parameter ``distribution`` is now called ``capping_method`` to unify names across Feature-engine transformers. Tutorials ~~~~~~~~~ - **Imputation**: updated "how to" examples of missing data imputation (**by Pradumna Suryawanshi**) - **Encoders**: new and updated "how to" examples of categorical encoding (**by Ashok Kumar**) - **Discretisation**: new and updated "how to" examples of discretisation (**by Ashok Kumar**) - **Variable transformation**: updated "how to" examples on how to apply mathematical transformations to variables (**by Pradumna Suryawanshi**) For Contributors and Developers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Code Architecture ^^^^^^^^^^^^^^^^^ - **Submodules**: transformers have been grouped within relevant submodules and modules. - **Individual tests**: testing classes have been subdivided into individual tests - **Code Style**: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code. - **Type hint**: we rolled out the use of type hint throughout classes and functions (**by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah**) Documentation ^^^^^^^^^^^^^ - Switched fully to numpydoc and away from Napoleon - Included more detail about methods, parameters, returns and raises, as per numpydoc docstring style (**by Nodar Okroshiashvili, Soledad Galli**) - Linked documentation to github repository - Improved layout Other Changes ~~~~~~~~~~~~~ - **Updated documentation**: documentation reflects the current use of Feature-engine transformers - **Typo fixes**: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)