Version 0.6.X#
Version 0.6.1#
Deployed: Friday, September 18, 2020
Contributors: Soledad Galli
- Minor Changes:
Updated docs: updated and expanded Contributing guidelines, added Governance, updated references to Feature-engine online.
Updated Readme: updated and expanded readme.
Version 0.6.0#
Deployed: Friday, August 14, 2020
- Contributors:
Michał Gromiec
Surya Krishnamurthy
Gleb Levitskiy
Karthik Kothareddy
Richard Cornelius Suwandi
Chris Samiullah
Soledad Galli
- Major Changes:
New Transformer: the
MathematicalCombinator
allows you combine multiple features into new variables by performing mathematical operations like sum, product, mean, standard deviation, or finding the minimum and maximum values (by Michał Gromiec).New Transformer: the
DropFeatures
allows you remove specified variables from a dataset (by Karthik Kothareddy).New Transformer: the
DecisionTreeCategoricalEncoder
encodes categorical variables with a decision tree (by Surya Krishnamurthy).Bug fix: the
SklearnTransformerWrapper
can now automatically select numerical or numerical and categorical variables depending on the Scikit-learn transformer the user implements (by Michał Gromiec).Bug fix: the
SklearnTransformerWrapper
can now wrap Scikit-learn’s OneHotEncoder and concatenate the binary features back to the original dataframe (by Michał Gromiec).Added functionality: the
ArbitraryNumberImputer
can now take a dictionary of variable, arbitrary number pairs, to impute different variables with different numbers (by Michał Gromiec).Added functionality: the
CategoricalVariableImputer
can now replace missing data in categorical variables by a string defined by the user (by Gleb Levitskiy).Added functionality: the
RareLabelEnoder
now allows the user to determine the maximum number of categories that the variable should have when grouping infrequent values (by Surya Krishnamurthy).
- Minor Changes:
Improved docs: fixed typos and tidy Readme.md (by Richard Cornelius Suwandi)
Improved engineering practices: added Manifest.in to include md and licenses in tar ball in pypi (by Chris Samiullah)
Improved engineering practices: updated circleci yaml and created release branch for orchestrated release of new versions with significant changes (by Soledad Galli and Chris Samiullah)
Improved engineering practices: added test for doc build in circleci yaml (by Soledad Galli and Chris Samiullah)
Transformer fix: removed parameter return_object from the RareLabelEncoder as it was not working as intended(by Karthik Kothareddy and Soledad Galli)
Version 0.5.0#
Deployed: Friday, July 10, 2020
Contributors: Soledad Galli
- Major Changes:
- Bug fix: fixed error in weight of evidence formula in the
WoERatioCategoricalEncoder
. The old formula, that is np.log( p(1) / p(0) ) is preserved, and can be obtained by setting theencoding_method
to ‘log_ratio’. Ifencoding_method
is set to ‘woe’, now the correct formula will operate. Added functionality: most categorical encoders have the option
inverse_transform
, to obtain the original value of the variable from the transformed dataset.
- Bug fix: fixed error in weight of evidence formula in the
Added functionality: the
'Winsorizer`
,OutlierTrimmer
andArbitraryOutlierCapper
have now the option to ignore missing values, and obtain the parameters from the original variable distribution, or raise an error if the dataframe contains na, by setting the parametermissing_values
toraise
orignore
.New Transformer: the
UserInputDiscretiser
allows users to discretise numerical variables into arbitrarily defined buckets.
Version 0.4.3#
Deployed: Friday, May 15, 2020
Contributors: Soledad Galli, Christopher Samiullah
- Major Changes:
New Transformer: the
'SklearnTransformerWrapper`
allows you to use most Scikit-learn transformers just on a subset of features. Works with the SimpleImputer, the OrdinalEncoder and most scalers.
- Minor Changes:
Added functionality: the
'EqualFrequencyDiscretiser`
andEqualWidthDiscretiser
now have the ability to return interval boundaries as well as integers, to identify the bins. To return boundareis set the parameterreturn_boundaries=True
.Improved docs: added contibuting section, where you can find information on how to participate in the development of Feature-engine’s code base, and more.
Version 0.4.0#
Deployed: Monday, April 04, 2020
Contributors: Soledad Galli, Christopher Samiullah
- Major Changes:
Deprecated: the
FrequentCategoryImputer
was integrated into the classCategoricalVariableImputer
. To perform frequent category imputation now use:CategoricalVariableImputer(imputation_method='frequent')
Renamed: the
AddNaNBinaryImputer
is now calledAddMissingIndicator
.New: the
OutlierTrimmer
was introduced into the package and allows you to remove outliers from the dataset
- Minor Changes:
Improved: the
EndTailImputer
now has the additional option to place outliers at a factor of the maximum value.Improved: the
FrequentCategoryImputer
has now the functionality to return numerical variables cast as object, in case you want to operate with them as if they were categorical. Setreturn_object=True
.Improved: the
RareLabelEncoder
now allows the user to define the name for the label that will replace rare categories.Improved: All feature engine transformers (except missing data imputers) check that the data sets do not contain missing values.
Improved: the
LogTransformer
will raise an error if a variable has zero or negative values.Improved: the
ReciprocalTransformer
now works with variables of type integer.Improved: the
ReciprocalTransformer
will raise an error if the variable contains the value zero.Improved: the
BoxCoxTransformer
will raise an error if the variable contains negative values.Improved: the
OutlierCapper
now finds and removes outliers based of percentiles.Improved: Feature-engine is now compatible with latest releases of Pandas and Scikit-learn.
Version 0.3.0#
Deployed: Monday, August 05, 2019
Contributors: Soledad Galli.
- Major Changes:
New: the
RandomSampleImputer
now has the option to set one seed for batch imputation or set a seed observation per observations based on 1 or more additional numerical variables for that observation, which can be combined with multiplication or addition.New: the
YeoJohnsonTransfomer
has been included to perform Yeo-Johnson transformation of numerical variables.Renamed: the
ExponentialTransformer
is now calledPowerTransformer
.Improved: the
DecisionTreeDiscretiser
now allows to provide a grid of parameters to tune the decision trees which is done with a GridSearchCV under the hood.New: Extended documentation for all Feature-engine’s transformers.
New: Quickstart guide to jump on straight onto how to use Feature-engine.
New: Changelog to track what is new in Feature-engine.
Updated: new
Jupyter notebooks
with examples on how to use Feature-engine’s transformers.
- Minor Changes:
Unified: dictionary attributes in transformers, which contain the transformation mappings, now end with
_
, for examplebinner_dict_
.