How to improve the performance of queries computing the haversine formula?

Given a table of locations with latitudes and longitudes, which of these places is closest to a given location?

Of course, finding distances on the surface of the earth means using the distances of the Great Circle developed with the Haversin formula, also called the spherical cosine formula.

I have the following code:

SELECT zip, latitude, longitude, distance FROM ( SELECT z.zip, z.latitude, z.longitude, p.radius, p.distance_unit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) AS distance FROM zip AS z JOIN ( /* these are the query parameters */ SELECT 42.81 AS latpoint, -70.81 AS longpoint, 50.0 AS radius, 111.045 AS distance_unit ) AS p ON 1=1 WHERE z.latitude BETWEEN p.latpoint - (p.radius / p.distance_unit) AND p.latpoint + (p.radius / p.distance_unit) AND z.longitude BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint)))) AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint)))) ) AS d WHERE distance <= radius 

Is there a way to improve the performance of this request?

Do I need to use PostGIS to improve it, or just a wrapper for my haversine formula?

+7
performance postgresql gis location
source share
3 answers

I assume that the planner will work on this request, rewriting everything on its own, but it is worth a try. At least he's more neat.

 select zip, latitude, longitude, distance from ( select z.zip, z.latitude, z.longitude, p.radius, p.distance_unit * p.degrees_acos_cos_radians_latpoint * cos(radians(z.latitude)) * cos(radians(p.longpoint - z.longitude)) + p.sin_radians_latpoint * sin(radians(z.latitude)))) as distance from zip z cross join ( select latpoint, longpoint, radius, distance_unit, latpoint - radius / distance_unit as lat0, latpoint + radius / distance_unit as lat1, longpoint - radius / distance_unit * cos(radians(latpoint)) as long0, longpoint + radius / distance_unit * cos(radians(latpoint)) as long1, sin(radians(latpoint)) as sin_radians_latpoint, degrees(acos(cos(radians(latpoint)) as degrees_acos_cos_radians_latpoint from ( values (42.81, -70.81, 50.0, 111.045) ) v (latpoint, longpoint, radius, distance_unit) ) p where z.latitude between lat0 and lat1 and z.longitude between long0 and long1 ) d where distance <= radius 
+1
source share

This request will never be particularly fast. However, there are some ways to improve it.

First one . The Haversin formula is not needed here. The amendments that he applies are only necessary when the curvature of the earth is a significant factor or very close to the poles. None of them matter here - the largest distance that needs to be calculated exactly is 12 miles, which is hardly even beyond the horizon. The earth is effectively flat on this scale, so the Pythagorean theorem is good enough to calculate distances.

One degree of latitude is about 69 miles, and at 52 ° north latitude (around which the Netherlands is located) the degree of longitude is cos (52 °) x 69 = 42.5 miles, so the formula becomes:

sqrt (pow (69 * (lat - $ latitude), 2) + pow (42.5 * (lng - $ longitude), 2))

Second : we can use the "scissor test" for latitude and longitude. If a point is more than 12 miles in any cardinal direction from your target point, it, of course, cannot be within the 12-mile circle of that point. We can use this fact for quick comparisons in latitude and longitude, completely skipping the calculation of distance. Using the numbers for one degree of latitude / longitude displayed above, we have:

WHERE (lat BETWEEN ($ latitude - 12 / 69.0) AND ($ latitude + 12 / 69.0)) AND (lng BETWEEN ($ latitude - 12 / 42.5) AND ($ longitude + 12 / 42.5))

Please note that this does not replace full distance verification! This is just the first step to quickly throw out points that may not be in the correct radius. With an in-place index on lat or lng, this will allow the database server to avoid checking many rows in the database.

+2
source share

Expressions are not a slow part. The problem with finding the closest is the difficulty of using the index to limit the number of rows to look at.

If you don't have them on z yet, they will help:

 INDEX(latitude), INDEX(longitude) 

If you already have them, make sure that one of them was actually used by the subquery.

The next step will be sharper (and more fruitful): http://mysql.rjweb.org/doc.php/latlng

0
source share

All Articles