RTree: count points in neighborhoods at each point in another set of points

Why does this not return the number of points in each neighborhood (bounding box)?

import geopandas as gpd def radius(points_neighbour, points_center, new_field_name, r): """ :param points_neighbour: :param points_center: :param new_field_name: new field_name attached to points_center :param r: radius around points_center :return: """ sindex = points_neighbour.sindex pts_in_neighbour = [] for i, pt_center in points_center.iterrows(): nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))) pts_in_this_neighbour = points_neighbour[nearest_index] pts_in_neighbour.append(len(pts_in_this_neighbour)) points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour) 

Each cycle gives the same result.

Second question: how to find the k-th nearest neighbor?

Additional information about the problem itself:

  • We do this on a very small scale, for example. Washington, USA or British Columbia, Canada

  • We hope to use as much geodata as possible, as it is similar to pandas and supports spatial indexing: RTree

  • For example, sindex has the closest method, intersection, etc.

Please comment if you need more information. This is the code in the GeoPandasBase class

 @property def sindex(self): if not self._sindex_generated: self._generate_sindex() return self._sindex 

I tried Richard's example, but that didn't work

 def radius(points_neighbour, points_center, new_field_name, r): """ :param points_neighbour: :param points_center: :param new_field_name: new field_name attached to points_center :param r: radius around points_center :return: """ sindex = points_neighbour.sindex pts_in_neighbour = [] for i, pt_center in points_center.iterrows(): pts_in_this_neighbour = 0 for n in sindex.intersection(((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))): dist = pt_center.distance(points_neighbour['geometry'][n]) if dist < radius: pts_in_this_neighbour = pts_in_this_neighbour + 1 pts_in_neighbour.append(pts_in_this_neighbour) points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour) 

To download the form file, go to https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing and select ArcView to download

+7
spatial-index gis geospatial r-tree geopandas
source share
2 answers

I added code that should do what you want with some minor changes.

I think your problem arose for one of two reasons:

  • You built the spatial index incorrectly. Your responses to my comments suggested that you were not fully aware of how the spatial index is created.

  • The bounding box for your spatial query was not built correctly.

I will discuss both possibilities below.

Building a spatial index

As it turned out, the spatial index is built simply by entering:

 sindex = gpd_df.sindex 

Magic.

But where gpd_df.sindex get its data from? It is assumed that the data is stored in a geometry column in shapely format. If you have not added data to such a column, this will raise a warning.

Proper initialization of the data frame will look like this:

 #Generate random points throughout Oregon x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000) y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000) #Turn the lat-long points into a geodataframe gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y}) #Set up point geometries so that we can index the data frame #Note that I am using xy points! gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1) #Automagically constructs a spatial index from the `geometry` column gpd_df.sindex 

Seeing the example code example above in your question would help you diagnose your problem and continue solving it.

Since you didn’t receive the extremely obvious geopandas warning when a geometry column is missing:

AttributeError: geometry data has not yet been set (expected in column geometry).

I think you probably did this part right.

Building a bounding box

In your question, you form a bounding box as follows:

 nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r))) 

As it turned out, the bounding rectangles have the form:

 (West, South, East, North) 

At least they are executed for stylized XY points, for example. shapely.geometry.Point(Lon,Lat)

In my code, I use the following:

 bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius) 

Working example

Combining the above leads me to this working example. Please note that I also demonstrate how to sort points by distance, answering your second question.

 #!/usr/bin/env python3 import numpy as np import numpy.random import geopandas as gpd import shapely.geometry import operator oregon_xmin = -124.5664 oregon_xmax = -116.4633 oregon_ymin = 41.9920 oregon_ymax = 46.2938 def radius(gpd_df, cpt, radius): """ :param gpd_df: Geopandas dataframe in which to search for points :param cpt: Point about which to search for neighbouring points :param radius: Radius about which to search for neighbours :return: List of point indices around the central point, sorted by distance in ascending order """ #Spatial index sindex = gpd_df.sindex #Bounding box of rtree search (West, South, East, North) bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius) #Potential neighbours good = [] for n in sindex.intersection(bbox): dist = cpt.distance(gpd_df['geometry'][n]) if dist<radius: good.append((dist,n)) #Sort list in ascending order by `dist`, then `n` good.sort() #Return only the neighbour indices, sorted by distance in ascending order return [x[1] for x in good] #Generate random points throughout Oregon x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000) y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000) #Turn the lat-long points into a geodataframe gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y}) #Set up point geometries so that we can index the data frame gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1) #The 'x' and 'y' columns are now stored as part of the geometry, so we remove #their columns in order to save space del gpd_df['x'] del gpd_df['y'] for i, row in gpd_df.iterrows(): neighbours = radius(gpd_df,row['geometry'],0.5) print(neighbours) #Use len(neighbours) here to construct a new row for the data frame 

(What I requested in the comments is code that looks like the previous one, but that illustrates your problem. Note the use of random to briefly create a data set for experiments.)

+3
source share

Instead of directly answering your question, I would say that you are doing it wrong. Arguing this, I will give the best answer.

Why are you wrong

The R-tree is great for querying with a limited box in two or three Euclidean dimensions.

You are viewing longitude-latitude points on a two-dimensional surface curved in three-dimensional space. As a result, your coordinate system will produce singularities and breaks: 180 ° W coincides with 180 ° E, 2 ° E at 90 ° N is close to 2 ° W at 90 ° s. Sh. R-tree does not capture such things!

But, even if they were a good solution, your idea of ​​taking lat ± r and lon ± r gives a square area; rather, you probably need a circular area around your point.

How to do it right

  • Instead of saving points in lon-lat format, convert them to xyz format using a spherical coordinate transformation . Now they are in three-dimensional Euclidean space and there are no features or gaps.

  • Place the dots in the three-dimensional kd-tree . This allows you to quickly, in O (log n) time, ask questions such as "What are the k-closest neighbors on this issue?" and "What are all the points in the radius r of these points?" SciPy comes with an implementation .

  • To search for a radius, convert from Great Circle radius to chord : this makes a search in 3-space equivalent to finding a radius on a circle wrapped on the surface of a sphere (in this case, the Earth).

Code for correct

I implemented the above in Python as a demo. Please note that all spherical points are stored in the format (longitude, latitude) / (xy) using the scheme lon = [- 180,180], lat = [- 90.90]. All three-dimensional points are stored in the format (x, y, z).

 #/usr/bin/env python3 import numpy as np import scipy as sp import scipy.spatial Rearth = 6371 #Generate uniformly-distributed lon-lat points on a sphere #See: http://mathworld.wolfram.com/SpherePointPicking.html def GenerateUniformSpherical(num): #Generate random variates pts = np.random.uniform(low=0, high=1, size=(num,2)) #Convert to sphere space pts[:,0] = 2*np.pi*pts[:,0] #0-360 degrees pts[:,1] = np.arccos(2*pts[:,1]-1) #0-180 degrees #Convert to degrees pts = np.degrees(pts) #Shift ranges to lon-lat pts[:,0] -= 180 pts[:,1] -= 90 return pts def ConvertToXYZ(lonlat): theta = np.radians(lonlat[:,0])+np.pi phi = np.radians(lonlat[:,1])+np.pi/2 x = Rearth*np.cos(theta)*np.sin(phi) y = Rearth*np.sin(theta)*np.sin(phi) z = Rearth*np.cos(phi) return np.transpose(np.vstack((x,y,z))) #Get all points which lie with `r_km` Great Circle kilometres of the query #points `qpts`. def GetNeighboursWithinR(qpts,kdtree,r_km): #We need to convert Great Circle kilometres into chord length kilometres in #order to use the kd-tree #See: http://mathworld.wolfram.com/CircularSegment.html angle = r_km/Rearth chord_length = 2*Rearth*np.sin(angle/2) pts3d = ConvertToXYZ(qpts) #See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point #p=2 implies Euclidean distance, eps=0 implies no approximation (slower) return kdtree.query_ball_point(pts3d,chord_length,p=2,eps=0) ############################################################################## #WARNING! Do NOT alter pts3d or kdtree will malfunction and need to be rebuilt ############################################################################## ############################## #Correctness tests on the North, South, East, and West poles, along with Kolkata ptsll = np.array([[0,90],[0,-90],[0,0],[-180,0],[88.3639,22.5726]]) pts3d = ConvertToXYZ(ptsll) kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up qptsll = np.array([[-3,88],[5,-85],[10,10],[-178,3],[175,4]]) GetNeighboursWithinR(qptsll, kdtree, 2000) ############################## #Stress tests ptsll = GenerateUniformSpherical(100000) #Generate uniformly-distributed lon-lat points on a sphere pts3d = ConvertToXYZ(ptsll) #Convert points to 3d #See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up qptsll = GenerateUniformSpherical(100) #We'll find neighbours near these points GetNeighboursWithinR(qptsll, kdtree, 500) 
+2
source share

All Articles