load__titanic#
- feature_engine.datasets.load_titanic(return_X_y_frame=False, predictors_only=False, handle_missing=False, cabin=None)[source]#
The load_titanic() function returns the well-known titanic dataset.
Note that you need to have an internet connection for this function to work, as we are calling the dataset stored in openML which can be downloaded from here.
- Parameters
- return_X_y_frame: bool, default=False
If
True
, it returns a DataFrame (X) with the predictors and a Series (y) with the target variable. IfFalse
, it returns a single DataFrame with predictors and target.- predictors_only: bool, default=False
If
False
, it returns all the variables from the original Titanic Dataset. IfTrue
, it reurns only relevant predictors.- handle_missing: bool, default=False
If
False
, it returns the original dataset with missing values. IfTrue
, missing data is replaced with the string “Missing” in categorical variables and the mean in numerical variables.- cabin: str, default=None
If
None
, it returns the variable cabin as in the original data. If ‘drop’, it removes the variable from the data. If ‘letter_only’ it returns just the first letter of the cabin, without the number.
Examples
>>> from feature_engine.datasets import load_titanic >>> data = load_titanic(predictors_only=True, cabin="drop") >>> print(data.head()) pclass survived sex age sibsp parch fare embarked 0 1 1 female 29.0000 0 0 211.3375 S 1 1 1 male 0.9167 1 2 151.5500 S 2 1 0 female 2.0000 1 2 151.5500 S 3 1 0 male 30.0000 1 2 151.5500 S 4 1 0 female 25.0000 1 2 151.5500 S