Version 1.5.X#

Version 1.5.2#

Deployed: 21th November 2022

Contributors#

Gleb Levitski

Alfonso Tobar

pxn39

Soledad Galli

In this release, we expand the functionality of existing classes and the documentation.

New functionality#

The StringSimilarityEncoder can now create similarity variables based on keywords entered by the user (Gleb Levitski)

The Winsorizer and OutlierTrimmer now automatically adjust the value of the fold parameter based on the capping_method (pxn39)

Bug fixes#

Type checks errors raised by newer versions (Gleb Levitski)

Documentation#

Add example code snippets to the categorical encoding API docs (Alfonso Tobar)

Add example code snippets to the imputation module API docs (Alfonso Tobar)

Add example code snippets to the discretisation module API docs (Alfonso Tobar)

Add example code snippets to the creation module API docs (Alfonso Tobar)

Add example code snippets to the datetime module API docs (Alfonso Tobar)

Update the user guide docs for the forecasting feature transformers (Soledad Galli)

Update the user guide docs for datetime features and cyclical features (Soledad Galli)

Fix badges in README (Gleb Levitski)

Version 1.5.0#

Deployed: 17th October 2022

Contributors#

Gleb Levitski

David Cortes

Alfonso Tobar

Morgan Sell

Soledad Galli

In this release, we fix a bug that made the get_feature_names_out not compatible with the Scikit-learn pipeline.

In addition, thanks to Gleb Levitski, we’ve got a new encoder to replace categories by string similarity variables. Gleb Levitski also made a number of code enhancements to various transformers across the library, making a lot of new functionality available.

Finally, we’d like to thank Alfonso Tobar, David Cortes and Morgan Sell for creating new transformers, fixing bugs and expanding the functionality of Feature-engine.

Thank you so much to all contributors and to those of you who created issues flagging bugs or requesting new functionality.

New transformers#

StringSimilarityEncoder: encodes categorical variables based on string similarity (Gleb Levitski)

MatchCategories: matches the categories in train and test set when of type pandas categorical (David Cortes)

SelectByInformationValue: selects features based on the information value (Morgan Sell and Soledad Galli)

New functionality#

The MeanEncoder can now implement smoothing during the encoding to handle high cardinality (Gleb Levitski)

The MeanEncoder can now encode unseen categories (Gleb Levitski)

The OrdinalEncoder can now encode unseen categories (Soledad Galli)

The CountFrequencyEncoder can now encode unseen categories (David Cortes)

All outlier transformers can now detect outliers based on the MAD rule (Gleb Levitski)

Add automatic calculation of PSI threshold in DropHighPSIFeatures (Gleb Levitski)

All feature selection transformers now have the method get_support() (Soledad Galli)

Bug fixes#

get_feature_names_out is now compatible with the Scikit-learn pipeline in all transformers (Soledad Galli)

The inverse_transform method in encoders now correctly handles unseen categories or raises not implemented errors (Soledad Galli)

Fixes output of SklearnTransformerWrapper for OneHotEncoder and PolynomialFeatures (Alfonso Tobar)

Documentation#

Add more resources to documentation (Soledad Galli)

User guide for StringSimilarityEncoder (Gleb Levitski)

New Jupyter notebook for StringSimilarityEncoder (Gleb Levitski)

User guide for SelectByInformationValue (Morgan Sell and Soledad Galli)

Deprecations#

Parameter errors in encoders is now replaced by unseen (Soledad Galli)

The classes MathematicalCombination, CombineWithFeatureReference and CyclicalTransformer are removed (Soledad Galli)

We are deprecating PRatioEncoder in version 1.5 and it will be removed in version 1.6 (Soledad Galli)

Code improvements#

Adds code coverage test (Soledad Galli)

Changes logic of encoding unseen categories to work with inverse_transform (Soledad Galli)

Increases code coverage for encoders (Soledad Galli)

Removes CategoricalInitExpandedMixin (Soledad Galli)

Removes checks for encoding dictionaries in all encoders (Soledad Galli)

Refactors creation module (Soledad Galli)

Refactors docstring module (Soledad Galli)

Refactors variable handling module (Soledad Galli)

Refactors numerical dictionary checks (Soledad Galli)

Refactors base transformers module (Soledad Galli)

Makes dataframe checks more performant (Soledad Galli)

Replaces pd.concat by pd.group in all target based encoders (Soledad Galli)

Boost Your Data Science Skills

Version 1.5.X#

Version 1.5.2#

Contributors#

New functionality#

Bug fixes#

Documentation#

Version 1.5.0#

Contributors#

New transformers#

New functionality#

Bug fixes#

Documentation#

Deprecations#

Code improvements#