find_numerical_variables#
find_numerical_variables() returns a list with the names of the numerical
variables in the dataset.
Let’s create a toy dataset with numerical, categorical and datetime variables:
import pandas as pd
df = pd.DataFrame({
"Name": ["tom", "nick", "krish", "jack"],
"City": ["London", "Manchester", "Liverpool", "Bristol"],
"Age": [20, 21, 19, 18],
"Marks": [0.9, 0.8, 0.7, 0.6],
"dob": pd.date_range("2020-02-24", periods=4, freq="min"),
})
print(df.head())
We see the resulting dataframe below:
Name City Age Marks dob
0 tom London 20 0.9 2020-02-24 00:00:00
1 nick Manchester 21 0.8 2020-02-24 00:01:00
2 krish Liverpool 19 0.7 2020-02-24 00:02:00
3 jack Bristol 18 0.6 2020-02-24 00:03:00
With find_numerical_variables() we capture the names of all the numerical
variables in a list. So let’s do that and then display the list:
from feature_engine.variable_handling import find_numerical_variables
var_num = find_numerical_variables(df)
var_num
We see the names of the numerical variables in the list below:
['Age', 'Marks']
If there are no numerical variables in the dataset, find_numerical_variables()
will raise an error. For example, the command
find_numerical_variables(df[["Name", "City", "dob"]]) results in a TypeError because
there are no numerical variables in that subset of the data.
We can return an empty list when no variables are found. To return an empty list, we
need to set return_empty to True:
find_numerical_variables(df[["Name", "City", "dob"]], return_empty=True)
The previous commands returns an empty list: [].