feature_engine.datasets.load_titanic(return_X_y_frame=False, predictors_only=False, handle_missing=False, cabin=None)[source]#

The load_titanic() function returns the well-known titanic dataset.

Note that you need to have an internet connection for this function to work, as we are calling the dataset stored in openML which can be downloaded from here.

return_X_y_frame: bool, default=False

If True, it returns a DataFrame (X) with the predictors and a Series (y) with the target variable. If False, it returns a single DataFrame with predictors and target.

predictors_only: bool, default=False

If False, it returns all the variables from the original Titanic Dataset. If True, it reurns only relevant predictors.

handle_missing: bool, default=False

If False, it returns the original dataset with missing values. If True, missing data is replaced with the string “Missing” in categorical variables and the mean in numerical variables.

cabin: str, default=None

If None, it returns the variable cabin as in the original data. If ‘drop’, it removes the variable from the data. If ‘letter_only’ it returns just the first letter of the cabin, without the number.


>>> from feature_engine.datasets import load_titanic
>>> data = load_titanic(predictors_only=True, cabin="drop")
>>> print(data.head())
   pclass  survived     sex      age  sibsp  parch      fare embarked
0       1         1  female  29.0000      0      0  211.3375        S
1       1         1    male   0.9167      1      2  151.5500        S
2       1         0  female   2.0000      1      2  151.5500        S
3       1         0    male  30.0000      1      2  151.5500        S
4       1         0  female  25.0000      1      2  151.5500        S