Optimization of ST_Intersects in PostgreSQL (PostGIS)

Question

Optimization of ST_Intersects in PostgreSQL (PostGIS)

The query below displays almost 15 minutes for the result to appear. And I wonder why? Because of the data? Or tops of geometries? When I tried a query with another table (small shapefile size), it works fast.

Here is the request. (Thanks to Patrick for this):

WITH hi AS ( SELECT ps.id, ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps JOIN evidensapp_seniangcbr fh ON fh.hazard = 'High' AND ST_Intersects(fh.geom, ps.geom) ), med AS ( SELECT ps.id, ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Medium' AND ST_Intersects(fh.geom, ps.geom) EXCEPT SELECT * FROM hi ), low AS ( SELECT ps.id, ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Low' AND ST_Intersects(fh.geom, ps.geom) EXCEPT SELECT * FROM hi EXCEPT SELECT * FROM med ) SELECT brgy_locat AS barangay, municipali AS municipality, high, medium, low FROM (SELECT brgy_locat, municipali, count(*) AS high FROM hi GROUP BY 1, 2) cnt_hi FULL JOIN (SELECT brgy_locat, municipali, count(*) AS medium FROM med GROUP BY 1, 2) cnt_med USING (brgy_locat, municipali) FULL JOIN (SELECT brgy_locat, municipali, count(*) AS low FROM low GROUP BY 1, 2) cnt_low USING (brgy_locat, municipali);

PostgreSQL 9.3, PostGIS 2.1.5

Polystructures table: contains 9847 rows:

 CREATE TABLE evidensapp_polystructures ( id serial NOT NULL PRIMARY KEY, bldg_name character varying(100) NOT NULL, bldg_type character varying(50) NOT NULL, brgy_locat character varying(50) NOT NULL, municipali character varying(50) NOT NULL, province character varying(50) NOT NULL, geom geometry(MultiPolygon,32651) ); CREATE INDEX evidensapp_polystructures_geom_id ON evidensapp_polystructures USING gist (geom); ALTER TABLE evidensapp_polystructures CLUSTER ON evidensapp_polystructures_geom_id;

SeniangCBR table: only 6 rows, shapefile size (if that matters): 52.060 KB

 CREATE TABLE evidensapp_seniangcbr ( id serial NOT NULL PRIMARY KEY, hazard character varying(16) NOT NULL, geom geometry(MultiPolygon,32651) ); CREATE INDEX evidensapp_seniangcbr_geom_id ON evidensapp_seniangcbr USING gist (geom); ALTER TABLE evidensapp_seniangcbr CLUSTER ON evidensapp_seniangcbr_geom_id;

All data was automatically loaded into the database using the LayerMapping utility, since I am using Django (GeoDjango) .

ANNOUNCE ANALYSIS LINE HERE.

I don’t have a server now, I run the request on my PC.

Processor: Intel (R) Core (TM) i7-4790 CPU @ 3.60 GHz (8 processors), ~ 3.6 GHz
Memory: 8192 MB RAM
OS: Windows 7 64-bit

+4

django postgresql postgresql-9.3 query-optimization postgis

Sachi tekina Aug 4 '15 at 2:43

source share

3 answers

The output of EXPLAIN ANALYZE difficult to read because all fields and functions are scrambled into the radio alphabet . However, two things stand out:

Most of the time is spent on the ST_Intersects() function, and this is not surprising.
The EXCEPT clause also seems rather inefficient.

So please try this and not a more detailed version:

 SELECT brgy_locat AS barangay, municipali AS municipality, sum(CASE max_hz_id WHEN 3 THEN 1 ELSE 0 END) AS high, sum(CASE max_hz_id WHEN 2 THEN 1 ELSE 0 END) AS medium, sum(CASE max_hz_id WHEN 1 THEN 1 ELSE 0 END) AS low FROM ( SELECT ps.id, ps.brgy_locat, ps.municipali, max(CASE fh.hazard WHEN 'Low' THEN 1 WHEN 'Medium' THEN 2 WHEN 'High' THEN 3 END) AS max_hz_id FROM evidensapp_polystructures ps JOIN evidensapp_seniangcbr fh ON ST_Intersects(fh.geom, ps.geom) GROUP BY 1, 2, 3 ) AS ps_fh GROUP BY 1, 2;

Currently, there is only one call to ST_Intersects() , which may (possibly) be much faster than three calls on subsets of the hazard map (due to the internal effectiveness of PostGIS code).

As you know, the string of the hazard class is converted to a series of integers that make it easy to organize and compare. The internal query selects the maximum hazard value that matches your requirement. In the main query, these maximum values for each structure are summed into the corresponding columns. If at all possible, change the table structure to use these three integer codes and a reference to the helper table for the class label: your table will be reduced and therefore faster and the CASE statement in the internal query can be deleted. Alternatively, add a column with the whole code and update values according to the danger column.

Note that these CASE statements are not very efficient (the reason I used the EXCEPT clause in the previous answer). PG 9.4 introduces a new FILTER clause for aggregate functions, which simplifies and simplifies reading a query:

 count(id) FILTER (WHERE max_hz_id = 3) AS high

You might want to consider updating.

Selamat Mule Meinil

+2

Patrick Aug 4 '15 at 4:36

source share

Add the bounding_box geometry(Polygon,4326) column bounding_box geometry(Polygon,4326) to the table. The column value will be the bounding box (max x, y and min x, y multipolygon ) that fully encapsulates multipolygon .

Then your request will look like this:

 AND ST_Intersects(fh.bounding_box, ps.bounding_box) AND ST_Intersects(fh.geom, ps.geom)

The advantage of this is that the first call to ST_Intersects pretty fast. If it returns false, the second, more involved call to ST_Intersects never called, so you save some time in this case.

+1

caps Aug 16 '15 at 19:05

source share

Erwin brandstetter · Accepted Answer · 2015-08-18T02:16:31+0000

Like I suggested and explained on your subject , I would use UNION ALL instead of FULL JOIN in an external SELECT .

 WITH hi AS ( SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom) WHERE fh.hazard = 'High' GROUP BY 1, 2, 3 ) , med AS ( SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom) LEFT JOIN hi USING (brgy_locat, municipali) WHERE fh.hazard = 'Medium' AND hi.brgy_locat IS NULL GROUP BY 1, 2, 3 ) TABLE hi UNION ALL TABLE med UNION ALL SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom) LEFT JOIN hi USING (brgy_locat, municipali) LEFT JOIN med USING (brgy_locat, municipali) WHERE fh.hazard = 'Low' AND hi.brgy_locat IS NULL AND med.brgy_locat IS NULL GROUP BY 1, 2, 3;

This applies only to the highest level of danger for each set of lines with identical (brgy_locat, municipali) . The result is only lines that actually intersect with any line of the corresponding hazard level in evidensapp_seniangcbr . In addition, the counter only counts lines that actually intersect. There may be more lines with the same (brgy_locat, municipali) in evidensapp_polystructures , simply not intersecting with the same level of danger and, therefore, ignored.

Choose one of the standard methods to exclude lines for which you have already found a match at a higher hazard level at the lower levels.

Select rows not in another table.

LEFT JOIN / IS NULL should use the index on id and work very well here. Of course, faster than using EXCEPT based on the whole row, which cannot use the index.

Index

You do not need to add the bounding_box geometry column to the table, as suggested by another answer. In modern versions, PostGIS uses (index constraint) frame constraint automatically . PostGIS Documentation:

This function call automatically includes a bounding box comparison, which will use any indexes available in the geometry.

In fact, we already see the index scan in explaining the output you posted.

Your existing GiST index evidensapp_polystructures_geom_id should complete the query quickly.
_{In addition, the index name must be evidensapp_polystructures_geom_idx .}

Also, create an index on (brgy_locat, municipali) if you don't already have one:

 CREATE INDEX foo_idx ON evidensapp_polystructures (brgy_locat, municipali);

Alternative with `LATERAL` join

Since you only have 6 lines in evidensapp_seniangcbr , LATERAL can be faster:

 WITH hi AS ( SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh , LATERAL ( SELECT ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps WHERE ST_Intersects(fh.geom, ps.geom) ) ps WHERE fh.hazard = 'High' GROUP BY 1, 2, 3 ) , med AS ( SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh , LATERAL ( SELECT ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps LEFT JOIN hi USING (brgy_locat, municipali) WHERE hi.brgy_locat IS NULL AND ST_Intersects(fh.geom, ps.geom) ) ps WHERE fh.hazard = 'Medium' GROUP BY 1, 2, 3 ) TABLE hi UNION ALL TABLE med UNION ALL SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct FROM evidensapp_seniangcbr fh , LATERAL ( SELECT ps.id, ps.brgy_locat, ps.municipali FROM evidensapp_polystructures ps LEFT JOIN hi USING (brgy_locat, municipali) LEFT JOIN med USING (brgy_locat, municipali) WHERE hi.brgy_locat IS NULL AND med.brgy_locat IS NULL AND ST_Intersects(fh.geom, ps.geom) ) ps WHERE fh.hazard = 'Low' GROUP BY 1, 2, 3;

About LATERAL joins:

What is the difference between LATERAL and a subquery in PostgreSQL?

Optimization of ST_Intersects in PostgreSQL (PostGIS)

Index

Alternative with LATERAL join

More articles:

Alternative with `LATERAL` join