I am trying to find all the nearest neighbors within a radius of 1 KM. Here is my script for building a tree and finding nearby points,
from pysal.cg.kdtree import KDTree
def construct_tree(s):
data_geopoints = [tuple(x) for x in s[['longitude','latitude']].to_records(index=False)]
tree = KDTree(data_geopoints, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_KM)
return tree
def get_neighbors(s,tree):
indices = tree.query_ball_point(s, 1)
return indices
tree = construct_tree(data)
data['neighborhood'] = data['lat_long'].apply(lambda row: get_neighbors(row,tree))
From what I read on the pysal page, it says:
kd-tree built on top of the kd-tree functionality in scipy. If you use scipy 0.12 or more, use scipy.spatial.cKDTree, otherwise use scipy.spatial.KDTree.
In my case, it should use cKDTree. This works fine for a sample dataset, but since it tree.query_ball_pointreturns a list of indexes as a result. Each list will contain 100 items. For my data (2 million records), it gets bigger and bigger and stops due to a memory problem after a certain point. Any idea on how to solve this?