Version 1.3.X#
Version 1.3.0#
Deployed: 5th May 2022
Contributors#
In this release, we add the get_feature_names_out functionality to all our transformers! You asked for it, we delivered :)
In addition, we introduce a new module for time series forecasting. This module will host transformers that create features suitable for, well…, time series forecasting. We created three new transformers: LagFeatures, WindowFeatures and ExpandingWindowFeatures. We had the extraordinary support from Kishan Manani who is an experienced forecaster, and Morgan Sell who helped us draft the new classes. Thank you both for the incredible work!
We also improved the functionality of our feature creation classes. To do this, we are
deprecating our former classes, MathematicalCombination and CombineWithFeatureReference,
which are a bit of a mouthful, for the new classes MathFeatures and RelativeFeatures.
We are also renaming the class CyclicalTransformer to CyclicalFeatures.
We’ve also enhanced the functionality of the SelectByTargetMeanPerformance and
SklearnTransformerWrapper.
In addition, we’ve had some bug reports and bug fixes that we list below, and a number of enhancements to our current classes.
Thank you so much to all contributors to this release for making this massive release possible!
New modules#
- timeseries-forecasting: this module hosts transformers that create features suitable for time series forecasting (Morgan Sell, Kishan Manani and Soledad Galli)
LagFeatures
WindowFeatures
ExpandingWindowFeatures
New transformers#
LagFeatures: adds lag versions of the features (Morgan Sell, Kishan Manani and Soledad Galli)
WindowFeatures: creates features from operations on past time windows (Morgan Sell, Kishan Manani and Soledad Galli)
ExpandingWindowFeatures: creates features from operations on all past data (Kishan Manani)
MathFeatures: replaces
MathematicalCombinationand expands its functionality (Soledad Galli)RelativeFeatures: replaces
CombineWithFeatureReferenceand expands its functionality (Soledad Galli)CyclicalFeatures: new name for
CyclicalTransformerwith same functionality (Soledad Galli)
New functionality#
All our transformers have now the
get_feature_names_outfunctionality to obtain the names of the output features (Alejandro Giacometti, Morgan Sell and Soledad Galli)SelectByTargetMeanPerformance now uses cross-validation and supports all possible performance metrics for classification and regression (Morgan Sell and Soledad Galli)
Enhancements#
All our feature selection transformers can now check that the variables were not dropped in a previous selection step (Gilles Verbockhaven)
The
DecisionTreeDiscretiserand theDecisionTreeEncodernow check that the user enters a target suitable for regression or classification (Morgan Sell)The
DecisionTreeDiscretiserand theDecisionTreeEncodernow accept all sklearn cross-validation constructors (Soledad Galli)The
SklearnTransformerWrappernow implements the methodinverse_transform(Soledad Galli)The
SklearnTransformerWrappernow supports additional transformers, for example, PolynomialFeatures (Soledad Galli)The
CategoricalImputer()now let’s you know which variables have more than one mode (Soledad Galli)The
DatetimeFeatures()now can extract features from the dataframe index (Edoardo Argiolas)Transformers that take y now check that X and y match (Noah Green and Ben Reiniger)
Bug fixes#
The
SklearnTransformerWrappernow works with cross-validation when using the one hot encoder (Noah Green)The
SelectByShufflingnow evaluates the initial performance and the performance after shuffling in the same data parts (Gilles Verbockhaven)Discretisers: when setting
return_boundaries=Truethe interval limits are now returned as strings and the variables as object data type (Soledad Galli)
DecisionTreeEncodernow enforces passing y tofit()(Soledad Galli)
DropMissingDatacan now take a string in thevariablesparameter (Soledad Galli)
DropFeaturesnow accepts a string as input of the features_to_drop parameter (Noah Green)Categorical encoders now work correctly with numpy arrays as inputs (Noah Green and Ben Reiniger)
Documentation#
Improved user guide for
SelectByTargetMeanPerformancewith lots of tips for troubleshooting (Soledad Galli)Added guides on how to use
MathFeaturesandRelativeFeatures(Soledad Galli)Expanded user guide on how to use
CyclicalFeatureswith explanation and demos of what these features are (Soledad Galli)Added a Jupyter notebook with a demo of the
CyclicalFeaturesclass (Soledad Galli)We now display all available methods in the documentation methods summary (Soledad Galli)
Fixes typo in
ArbitraryNumberImputerdocumentation (Tim Vink)
Deprecations#
We are deprecating
MathematicalCombination,CombineWithFeatureReferenceandCyclicalTransformerin version 1.3 and they will be removed in version 1.4Feature-engine does not longer work with Python 3.6 due to dependence on latest versions of Scikit-learn
In
MatchColumnsthe attributeinput_features_was replaced byfeature_names_in_to adopt Scikit-learn convention
Code improvements#
Imputers: removed looping over every variable to replace NaN. Now passing imputer dictionary to
pd.fillna()(Soledad Galli)
AddMissingIndicators: removed looping over every variable to add missing indicators. Now usingpd.isna()(Soledad Galli)
CategoricalImputernow captures all modes in one go, without looping over variables (Soledad Galli)Removed workaround to import docstrings for
transform()method in various transformers (Soledad Galli)
For developers#
Created functions and docstrings for common descriptions of methods and attributes (Soledad Galli)
We introduce the use of common tests that are applied to all transformers (Soledad Galli)
Experimental#
New experimental, currently private module: prediction, that hosts classes that are used by the SelectByTargetMeanPerformance
feature selection transformer. The estimators in this module have functionality that exceed that required by the selector,
in that, they can output estimates of the target by taking the average across a group of variables.
New private module, prediction with a regression and a classification estimator (Morgan Sell and Soledad Galli)
TargetMeanRegressor: estimates the target based on the average target mean value per class or interval, across variables (Morgan Sell and Soledad Galli)
TargetMeanClassifier: estimates the target based on the average target mean value per class or interval, across variables (Morgan Sell and Soledad Galli)