Version 1.6.X#
Version 1.6.2#
Deployed: 18th September 2023
Contributors#
New functionality#
MatchVariables()
can now also match the dtypes of the variables (Kyle Gilde)DatetimeFeatures()
andDatetimeSubtraction()
can now specify the format of the datetime variables (Soledad Galli)Add
inverse_transform
method toYeoJohnsonTransformer()
(Giorgio Segalla)
Bug fixes#
This bugs were introduced by the latest releases of pandas, Scikit-learn and Scipy.
Fix failing test for
YeoJohnsonTransformer()
(Soledad Galli)Fix failing test for
RareLabelEncoder()
(Soledad Galli)Fix failing test for
DatetimeFeatures()
(Soledad Galli)Fix failing test for many encoders: removed
downcast=infer
as it will be deprecated (Soledad Galli)Fix version related failing style checks (Soledad Galli)
Fix version related failing type checks (Soledad Galli)
Fix version related failing doc checks (Soledad Galli)
Fix future warning categorical imputation (Soledad Galli)
Code improvements#
Routine in
DatetimeFeatures()
does not enter into our check forutc=True
when working with different timezones any more (Soledad Galli)Improve performance in
OneHotEncoder()
(Soledad Galli)Add check for dupicated variable names in dataframe (David Cortes)
Documentation#
Fix various typos in user guide (Soledad Galli)
Update readthedocs.yml file (Soledad Galli)
Add link to license in Readme (Darigov Research)
Version 1.6.1#
Deployed: 8th June 2023
Contributors#
In this release, we make Feature-engine compatible with pandas 2.0, extend the functionality of some transformers, and we fix bugs introduced in the previous release.
Thank you so much to all contributors, Gleb Levitski and Claudio Salvatore Arcidiacono for helping with review and to those of you who created issues flagging bugs or requesting new functionality.
New functionality#
The Population Stability Index can now be used to evaluate categorical variables (dlaprins and Claudio Salvatore Arcidiacono)
RelativeFeatures
has the option to add a constant to avoid dividing by zero (Morgan Sell and Soledad Galli)SelectByShuffling
now accepts sample weights (Soledad Galli)WoEEncoder
now let’s you know which variables fail in the encoding (Soledad Galli)WoEEncoder
has the option to add a constant to avoid dividing by zero (Soledad Galli)
Bug fixes#
Fixed various bugs in
RareLabelEncoder()
(Soledad Galli)Renamed
transform
method in base classes tocheck_transform_input_and_state
, which fixed bugs raised whenset_output(transform="pandas")
in various classes (Soledad Galli and Claudio Salvatore Arcidiacono)
Code improvements#
Made code base compatible with pandas 2.0 (Claudio Salvatore Arcidiacono)
Moved docstrings of selection transformers to docstrings module (Soledad Galli)
Version 1.6.0#
Deployed: 16th March 2023
Contributors#
In this release, we make Feature-engine transformers compatible with the set_output
API from Scikit-learn, which was released in version 1.2.0. We also make Feature-engine
compatible with the newest direction of pandas, in removing the inplace
functionality
that our transformers use under the hood.
We introduce a major change: most of the categorical encoders can now encode variables even if they have missing data.
We are also releasing 3 brand new transformers: One for discretization, one for feature selection and one for operations between datetime variables.
We also made a major improvement in the performance of the DropDuplicateFeatures
and some
smaller bug fixes here and there.
We’d like to thank all contributors for fixing bugs and expanding the functionality and documentation of Feature-engine.
Thank you so much to all contributors and to those of you who created issues flagging bugs or requesting new functionality.
New transformers#
ProbeFeatureSelection: introduces random features and selects variables whose importance is greater than the random ones (Morgan Sell and Soledad Galli)
DatetimeSubtraction: creates new features by subtracting datetime variables (Kyle Gilde and Soledad Galli)
GeometricWidthDiscretiser: sorts continuous variables into intervals determined by geometric progression (Gleb Levitski)
New functionality#
Allow categorical encoders to encode variables with NaN (Soledad Galli)
Make transformers compatible with new
set_output
functionality from sklearn (Soledad Galli)The
ArbitraryDiscretiser()
now includes the lowest limits in the intervals (Soledad Galli)
New modules#
New Datasets module with functions to load specific datasets (Alfonso Tobar)
New variable_handling module with functions to automatically select numerical, categorical, or datetime variables (Soledad Galli)
Bug fixes#
Fixed bug in
DropFeatures()
(Luís Seabra)Fixed bug in
RecursiveFeatureElimination()
caused when only 1 feature remained in data (Soledad Galli)
Documentation#
Add example code snippets to the selection module API docs (Alfonso Tobar)
Add example code snippets to the outlier module API docs (Alfonso Tobar)
Add example code snippets to the transformation module API docs (Alfonso Tobar)
Add example code snippets to the time series module API docs (Alfonso Tobar)
Add example code snippets to the preprocessing module API docs (Alfonso Tobar)
Add example code snippets to the wrapper module API docs (Alfonso Tobar)
Updated documentation using new Dataset module (Alfonso Tobar and Soledad Galli)
Reorganized Readme badges (Gleb Levitski)
New Jupyter notebooks for
GeometricWidthDiscretiser
(Gleb Levitski)Fixed typos (Gleb Levitski)
Remove examples using the boston house dataset (Soledad Galli)
Update sponsor page and contribute page (Soledad Galli)
Deprecations#
The class
PRatioEncoder
is no longer supported and was removed from the API (Soledad Galli)
Code improvements#
Massive improvement in the performance (speed) of
DropDuplicateFeatures()
(Nodar Okroshiashvili)Remove
inplace
and other issues related to pandas new direction (Luís Seabra)Move most docstrings to dedicated docstrings module (Soledad Galli)
Unnest tests for encoders (Soledad Galli)