find_numerical_variables#

find_numerical_variables() returns a list with the names of the numerical variables in the dataset.

Let’s create a toy dataset with numerical, categorical and datetime variables:

import pandas as pd
df = pd.DataFrame({
    "Name": ["tom", "nick", "krish", "jack"],
    "City": ["London", "Manchester", "Liverpool", "Bristol"],
    "Age": [20, 21, 19, 18],
    "Marks": [0.9, 0.8, 0.7, 0.6],
    "dob": pd.date_range("2020-02-24", periods=4, freq="min"),
})

print(df.head())

We see the resulting dataframe below:

    Name        City  Age  Marks                 dob
0    tom      London   20    0.9 2020-02-24 00:00:00
1   nick  Manchester   21    0.8 2020-02-24 00:01:00
2  krish   Liverpool   19    0.7 2020-02-24 00:02:00
3   jack     Bristol   18    0.6 2020-02-24 00:03:00

With find_numerical_variables() we capture the names of all the numerical variables in a list. So let’s do that and then display the list:

from feature_engine.variable_handling import find_numerical_variables

var_num = find_numerical_variables(df)

var_num

We see the names of the numerical variables in the list below:

['Age', 'Marks']

If there are no numerical variables in the dataset, find_numerical_variables() will raise an error. For example, the command find_numerical_variables(df[["Name", "City", "dob"]]) results in a TypeError because there are no numerical variables in that subset of the data.

We can return an empty list when no variables are found. To return an empty list, we need to set return_empty to True:

find_numerical_variables(df[["Name", "City", "dob"]], return_empty=True)

The previous commands returns an empty list: [].