Version 1.8.X#

Version 1.8.3#

Deployed: 22nd Jan 2025

Contributors#

This release makes Feature-engine compatible with the latest version of scikit-learn 1.6.0.

Code maintenance#

Version 1.8.2#

Deployed: 2nd Nov 2024

Contributors#

In this release, we add 2 new transformers: A transformer to select features based on the MRMR (Maximum Relevance Minimum Redundancy) framework, and a transformer to implement mean normalization.

Mean normalization is a scaling procedure not supported by Scikit-learn, so we thought we’d give it a go ourselves. MRMR is a feature selection method made quite popular by Uber, probably because it is quite fast to compute.

In addition, we added some new functionality and expanded the documentation of the Forecasting features transformers.

Thank you very much to all contributors to this release!

If you value what we do, consider sponsoring us, so that we can keep updating Feature-engine at a fast pace.

New transformers#

  • MRMR() selects features based on the Maximum Relevance Minimum Redundancy framework (Soledad Galli)

  • MeanNormalizationScaler() scales features using mean normalization which consists of subtracting the mean and dividing by the value range (Vasco Schiavo)

Code improvements#

  • Add tz aware columns to variable handling functions, useful for datetime module transformers (JCCalvoJackson)

Documentation#

  • Expand user guide for the forecasting features transformer (Gurjinder Kaur)

  • Fix parameter default value of DecisionTreeEncoder() (Michael Russell <michaelrussell4>`_)

Version 1.8.1#

Deployed: 1st Sep 2024

Contributors#

In this release, we fix several bugs and future deprecation warnings from pandas and numpy. In addition, we expand the functionality of some feature selection classes to return the standard deviation of the derived feature importance.

We have also updated and expanded various pages of our documentation.

Thank you very much to all contributors to this release and to Vasco Schiavo and Gleb Levitski for actively discussing many of our PRs and issues.

If you value what we do, consider sponsoring us, so that we can keep updating Feature-engine at a fast pace.

Enhancements#

  • ProbeFeatureSelection can now also determine feature importance through single feature model performance (Soledad Galli)

  • ProbeFeatureSelection can now return the standard deviation of the feature importance (Soledad Galli)

  • RecursiveFeatureElimination and RecursiveFeatureAddition can now return the standard deviation of the feature importances (Soledad Galli)

  • SelectByShuffling, SelectBySingleFeaturePerformance and SelectByTargetMeanPerformance can now return the standard deviation of the feature importances (Soledad Galli)

  • All feature selection classes can now implement Group cross-validation through the groups parameter (Kanan Mahammadli)

Bug fixes#

  • The cv parameter of the recursive feature selectors can now take cv generators of the type KFold.split(X, y) (Alessandro Benetti)

  • The cv parameter of the remaining feature selection classes can now take cv generators of the type KFold.split(X, y) (Soledad Galli)

  • LogCpTransformer() adds a constant only to those variables that are strictly non-positive during fit (Soledad Galli)

  • Fix bug in MatchVariables that was preventing the transformer to work when missing values were raised (Soledad Galli)

  • Fix bug in inverse_transform() from YeoJohnsonTransformer() (Soledad Galli)

  • Fix pandas future warnings (Soledad Galli)

  • Fix numpy future warnings (olikra)

Code improvements#

  • Expand coverage of various tests (olikra)

Documentation#

Version 1.8.0#

Deployed: 26th May 2024

Contributors#

In this release, we make some breaking changes. The DecisionTreeEncoder() does not have the encoding pipeline any more. In its place, we now added an encoding_dict_ parameter that stores the mappings from category to predictions of the decision tree. This allowed us to implement in addition a way to handle unseen categories and the method inverse_transform.

We also expanded the functionality of the DecisionTreeDiscretiser(), which can now replace the continuous attributes with the decision tree predictions, interval limits, or bin number.

In addition, we introduce a new transformer, the DecisionTreeFreatures(), which adds new features to the data, resulting from predictions of decision trees trained on one or more features.

The classes from the module outliers can now automatically select the limit for the boundaries for outliers.

Finally, we have updated and expanded various pages of our documentation.

Thank you very much to all contributors to this release and to Vasco Schiavo and Gleb Levitski for actively reviewing many of our PRs.

If you value what we do, please consider sponsoring us, so that we can keep updating Feature-engine at a fast pace.

New#

  • DecisionTreeFeatures is a new transformer from the creation module that adds features based of predictions of decision trees (Soledad Galli)

Enhancements#

  • DecisionTreeEncoder now supports encodings for unseen categories, inverse_transform, and provides an encoding dictionary instead of the pipeline (Soledad Galli, Gleb Levitski and Lorenzo Vitali )

  • The DecisionTreeDiscretiser() can now replace the continuous attributes with the decision tree predictions, interval limits, or bin number (Soledad Galli)

  • The OutlierTrimmer() and Winsorizer() can now adjust the strength of the outlier search automatically based of the statistical method (param fold="auto") (Gleb Levitski)

Documentation#