Nearest neighbor in geospatial data analysis


One of the common GIS task is to find the nearest neighbor by given several candidates, in real life a typical question would be: “where is the nearest metro station to me?”.

In Python, this analysis can be done by shapely and geopandas.

Nearest point using Shapely

from shapely.geometry import Point, MultiPoint
from shapely.ops import nearest_points

# create 1 original point
source = Point(2,5.66)
# create 3 target points
target1, target2, target3 = Point(4,3), Point(2,2), Point(3,6)
# create a MultiPoint object for the target points
targets = MultiPoint([target1, target2, target3])
print(targets)
MULTIPOINT (4 3, 2 2, 3 6)
nearest_geometry = nearest_points(source, targets)
source_point, nearest_target = nearest_geometry
print('Coordinates of the original point: ', source_point)
print('Coordinates of the closest destination point: ', nearest_target)
Coordinates of the original point:  POINT (2 5.66)
Coordinates of the closest destination point:  POINT (3 6)

Nearest points using Geopandas

import geopandas as gpd
# Generate origin and destination points as GeoDataFrame
source = {
    'name': ['Source_1', 'Source_2'],
    'geometry': [Point(24.78, 60.73), Point(21.50, 61.13)]
}
source = gpd.GeoDataFrame(source, crs='EPSG:4326')

target = {
    'name': ['Helsinki', 'Tampere', 'Turku'],
    'geometry': [Point(24.95, 60.19), Point(23.79, 61.50), Point(22.27, 60.45)]
}
target = gpd.GeoDataFrame(target, crs='EPSG:4326')

target

namegeometry
0HelsinkiPOINT (24.95000 60.19000)
1TamperePOINT (23.79000 61.50000)
2TurkuPOINT (22.27000 60.45000)
# find the nearest point's index
# what does it return:
# Indices is an ndarray of shape (2,n) and distances (if present) an ndarray of shape (n). 
# The first subarray of indices contains input geometry indices. 
# The second subarray of indices contains tree geometry indices.
nearest_index = target.sindex.nearest(source['geometry'])
nearest_index
array([[0, 1],
       [0, 2]], dtype=int64)
# get the nearest target
nearest_target = target.iloc[nearest_index[1]].rename(columns={'name':'nearest target','geometry':'target geometry'})
nearest_target = nearest_target.reset_index(drop=True)
import pandas as pd
source = source.rename(columns={'name':'source name', 'geometry': 'source geometry'})
pd.concat([source, nearest_target], axis=1)

source namesource geometrynearest targettarget geometry
0Source_1POINT (24.78000 60.73000)HelsinkiPOINT (24.95000 60.19000)
1Source_2POINT (21.50000 61.13000)TurkuPOINT (22.27000 60.45000)

Author: wenvenn
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source wenvenn !