GeoDistanceFeatures#
- class feature_engine.creation.GeoDistanceFeatures(lat1, lon1, lat2, lon2, method='haversine', output_unit='km', output_col='geo_distance', drop_original=False, validate_ranges=True)[source]#
GeoDistanceFeatures() calculates the distance between two geographical coordinate pairs (latitude/longitude) and adds the result as a new feature.
This transformer is useful for location-based machine learning problems such as real estate pricing, delivery route optimization, ride-sharing applications, and any domain where geographic proximity is relevant.
The transformer supports different distance calculation methods:
‘haversine’: Great-circle distance using the Haversine formula (default). Most accurate for typical distances on Earth’s surface.
‘euclidean’: Simple Euclidean distance in the coordinate space. Fast but less accurate for long distances.
‘manhattan’: Manhattan (taxicab) distance in coordinate space. Useful as a rough approximation for grid-based city layouts.
More details in the User Guide.
- Parameters
- lat1: str
Column name containing the latitude of the first point.
- lon1: str
Column name containing the longitude of the first point.
- lat2: str
Column name containing the latitude of the second point.
- lon2: str
Column name containing the longitude of the second point.
- method: str, default=’haversine’
The distance calculation method. Options are: - ‘haversine’: Great-circle distance (most accurate) - ‘euclidean’: Euclidean distance in coordinate space - ‘manhattan’: Manhattan distance in coordinate space
- output_unit: str, default=’km’
The unit for the output distance. Options are: - ‘km’: Kilometers - ‘miles’: Miles - ‘meters’: Meters - ‘feet’: Feet
- output_col: str, default=’geo_distance’
Name of the new column containing the calculated distances.
- drop_original: bool, default=False
Whether to drop the original coordinate columns after transformation.
- validate_ranges: bool, default=True
Whether to validate that latitude values are within [-90, 90] and longitude values are within [-180, 180]. If False, coordinates outside valid ranges may produce incorrect distance calculations.
- Attributes
- variables_:
List of the coordinate variables used for distance calculation.
- feature_names_in_:
List with the names of features seen during fit.
- n_features_in_:
The number of features in the train set used in fit.
See also
feature_engine.creation.MathFeaturesCombines existing features using mathematical operations.
feature_engine.creation.RelativeFeaturesCreates features relative to reference variables.
References
- 1
Haversine formula: https://en.wikipedia.org/wiki/Haversine_formula
Examples
>>> import pandas as pd >>> from feature_engine.creation import GeoDistanceFeatures >>> X = pd.DataFrame({ ... 'origin_lat': [40.7128, 34.0522, 41.8781], ... 'origin_lon': [-74.0060, -118.2437, -87.6298], ... 'dest_lat': [34.0522, 41.8781, 40.7128], ... 'dest_lon': [-118.2437, -87.6298, -74.0060], ... }) >>> gdt = GeoDistanceFeatures( ... lat1='origin_lat', lon1='origin_lon', ... lat2='dest_lat', lon2='dest_lon', ... method='haversine', output_unit='km' ... ) >>> gdt.fit(X) >>> X = gdt.transform(X) >>> X origin_lat origin_lon dest_lat dest_lon geo_distance 0 40.7128 -74.0060 34.0522 -118.2437 3935.746254 1 34.0522 -118.2437 41.8781 -87.6298 2808.517344 2 41.8781 -87.6298 40.7128 -74.0060 1144.286561
Methods
fit:
This transformer does not learn parameters.
fit_transform:
Fit to data, then transform it.
transform:
Calculate distances and add them as a new column.
get_feature_names_out:
Get output feature names for transformation.
- fit(X, y=None)[source]#
This transformer does not learn parameters.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples.
- y: pandas Series, or np.array. Defaults to None.
It is not needed in this transformer. You can pass y or None.
- Returns
- self: GeoDistanceFeatures
The fitted transformer.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters. Pass only if the estimator accepts additional params in its
fitmethod.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.