DatetimeOrdinal#
DatetimeOrdinal() converts datetime variables into ordinal numbers, thereby
providing a numerical representation of the date. By default, it returns the proleptic
Gregorian ordinal of the date, where 1st January of year 1 has ordinal 1.
If 1st January of year 1 has ordinal number 1 then, 2nd January of year 1 will have ordinal number 2, and so on.
Optionally, DatetimeOrdinal() can compute the number of days relative to a
user-defined start_date. This can be useful for reducing the magnitude of the ordinal
values and for aligning them to a specific project timeline.
Ordinal numbers preserve the relative distances between dates (e.g., the number of days between events), allowing algorithms to capture linear trends and temporal distances.
Datetime ordinals with pandas#
In Python, we can get the Gregorian ordinal of a date using the toordinal() method
from a datetime object as follows:
import pandas as pd
data = pd.DataFrame({"date": pd.to_datetime(["2023-01-01", "2023-01-10"])})
data["ordinal"] = data["date"].apply(lambda x: x.toordinal())
data
The output shows the new ordinal feature:
date ordinal
0 2023-01-01 738521
1 2023-01-10 738530
In the variable ordinal, the value 738521 means that 2023-01-01 is 738521 days
after the 1st of January of the year 1.
Datetime ordinal with Feature-engine#
DatetimeOrdinal() automatically converts one or more datetime variables into
ordinal numbers. It works with variables whose dtype is datetime, as well as with
object-type variables, provided that they can be parsed into datetime format.
DatetimeOrdinal() uses pandas toordinal() under the hood. The main
functionalities are:
It can convert multiple datetime variables at once.
It can compute the ordinal number relative to a
start_date.It can automatically find and select datetime variables.
Example#
First, let’s create a toy dataframe with 2 date variables:
import pandas as pd
from feature_engine.datetime import DatetimeOrdinal
toy_df = pd.DataFrame({
"var_date1": ['May-1989', 'Dec-2020', 'Jan-1999', 'Feb-2002'],
"var_date2": ['06/21/2012', '02/10/1998', '08/03/2010', '10/31/2020'],
"other_var": [1, 2, 3, 4]
})
Now, we will set up the transformer to convert var_date2 into an ordinal feature.
dtfs = DatetimeOrdinal(variables="var_date2")
df_transf = dtfs.fit_transform(toy_df)
df_transf
We see the new ordinal feature in the output:
var_date1 other_var var_date2_ordinal
0 May-1989 1 734675
1 Dec-2020 2 729430
2 Jan-1999 3 733987
3 Feb-2002 4 737729
By default, DatetimeOrdinal() drops the original datetime variable. To keep
it, you can set drop_original=False.
Calculate days from a start date#
DatetimeOrdinal() can also calculate the number of days elapsed since a
specific start_date.
dtfs = DatetimeOrdinal(
variables="var_date2",
start_date="2010-01-01"
)
df_transf = dtfs.fit_transform(toy_df)
df_transf
The new feature now represents the number of days between var_date2 and January 1st,
2010. Note that dates before the start_date will result in negative numbers.
var_date1 other_var var_date2_ordinal
0 May-1989 1 903
1 Dec-2020 2 -4343
2 Jan-1999 3 215
3 Feb-2002 4 3956
Missing timestamps#
DatetimeOrdinal() handles missing values (NaT) in datetime variables through
the missing_values parameter, which can be set to "raise" or "ignore".
If missing_values="raise", the transformer will raise an error if NaT values are
found in the datetime variables during fit() or transform().
If missing_values="ignore", the transformer will ignore NaT values, and the resulting
ordinal feature will contain NaN (or pd.NA) in their place.
Additional resources#
For tutorials on how to create and use features from datetime columns, check the following courses:
Feature Engineering for Machine Learning#
Feature Engineering for Time Series Forecasting#
Or read our book:
Python Feature Engineering Cookbook#
Both our book and course are suitable for beginners and more advanced data scientists alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.