I'm trying to find millions of points inside a half-dozen polygons. Here is my code:
def find_shape(longitude,latitude): if longitude != 0 and latitude != 0: point = shapely.geometry.Point(longitude,latitude) else: return "Unknown" for current_shape in all_shapes: if current_shape['bounding_box'].contains(point): if current_shape['shape'].contains(point): return current_shape['properties']['ShapeName'] break return "Unknown"
I read other questions regarding performance improvements for point-to-polygon queries spectacularly. They offer rtrees. However, this seems to be useful for cases where there are many polygons ( 36,000 in one question , 100,000 in another ), and it is not advisable to iterate over them.
I am already creating a bounding box, as you can see. Here is my form setup code:
with fiona.open(SHAPEFILE) as f_col: all_shapes = [] for shapefile_record in f_col: current_shape = {} current_shape['shape'] = shapely.geometry.asShape(shapefile_record['geometry']) minx, miny, maxx, maxy = current_shape['shape'].bounds current_shape['bounding_box'] = shapely.geometry.box(minx, miny, maxx, maxy) current_shape['properties'] = shapefile_record['properties'] all_shapes.append(current_shape)
Would it also be useful to check out another very simplified version of the form , namely one of the largest inscribed rectangle (or maybe a triangle)?
Checking beautiful documents, there is no function for this. Perhaps some simplify() settings? Of course, I always want to make sure that the new simplified form does not go beyond the original form, so I do not need to call contains() on the actual figure. I also think that I want to make the new simplified form as simple as possible for speed.
Any other suggestions are also welcome. Thanks!
EDIT : awaiting answers, I applied this idea to create a simplified form that meets my requirements:
current_shape['simple_shape'] = current_shape['shape'].simplify(.5) current_shape['simple_shape'] = current_shape['simple_shape'].intersection(current_shape['shape'])
Here's how I use it when testing each point:
if current_shape['simple_shape'].contains(point): return current_shape['properties']['ShapeName'] elif current_shape['shape'].contains(point): return current_shape['properties']['ShapeName']
This is not ideal, because the form is not as simple as it can be after performing the necessary intersection() . However, this approach has reduced processing time by 60%. In my tests, a simple polygon is used on 85% of point queries.
EDIT 2 . Another related question regarding GIS StackExchange: Python efficiency - we need more efficient ways to use OGR and Shapely . This refers to 1.5 million points in about 3,000 polygons.