Can I calculate closest locations via lat / long better than O (n) times?

I am wondering if there is an algorithm for calculating the nearest locations (represented by lat / long) better than O (n).

I know that I could use the Haversine formula to get the distance from the reference point to each place and sort ASC, but this is inefficient for large datasets.

How does the MySQL DISTANCE () function execute? I assume that O (n)?

+4
source share
10 answers

If you save your kd-tree points, you can do it in O(log n) time (expected) or O(sqrt(n)) worst case.

+8
source

Better than O (n)? Only if you follow the radix sort path or save locations using hash keys that represent the general location in which they are located.

For example, you can divide the globe with latitude and longitude into minutes, list the resulting areas and make a hash for the location of its area. Therefore, when the time comes to get the nearest place, you only need to check no more than 9 hash keys - you can check in advance if the neighboring grid can provide a close location than the best found so far, thereby reducing the number of places to calculate the distance before. It is still O (n), but with a much smaller constant factor. Properly implemented, you will not even notice it.

Or, if the data is in memory or accidentally accessed by chance, you can keep it sorted by latitude and longitude. Then you use binary search to find the nearest latitude and longitude in the corresponding datasets. Then you continue to read places with increasing latitude or longitude (i.e., in the previous and subsequent places) until it becomes impossible to find a closer location.

You know that you cannot find a near location when the latitude of the next location on either side of the data sorted by latitude will not be closer than the best case found so far, even if they belonged to the same longitude as the point with which calculates the distance. A similar test is used for data sorted by longitude.

It really gives you better than O (n) - closer to O (logN), I think, but it requires random rather than sequential access to the data and duplication of all data (or keys to the data, at least).

+2
source

You mentioned MySql, but there are some fairly complex spatial functions in SQL Server 2008 , including a geography data type. There is some information there about what types of things you ask. I don’t know spatially enough to talk about perfectionism. but I doubt that there is a time-limited algorithm to do what you ask for, but you could perform some quick operations with settings in places.

+2
source

If the data search is static, for example, the coordinates of all gas stations in the USA, then the correct index (BSP) will allow for an efficient search. Postgres has had good support since the mid-90s for two-dimensional indexed data, so you can do just that.

+2
source

I wrote an article about Find the closest row in DDJ a couple of years ago using a grid (I call it quadrants). Using it to find the nearest point (instead of strings) would simply be a shorthand for it.

The use of quadrants significantly reduces the time, although the complexity is mathematically undetectable (all points can theoretically lie in one quadrant). A prerequisite for using quadrants / grids is that you have the maximum distance for the desired object. If you are simply looking for the closest point without giving the maximum distance, you cannot use quadrants.

In this case, look at the Pattern for the closest neighboring problem (Larry Andrews in DDJ) , which has the complexity of recovering O (log n). I did not compare the runtime of both algorithms. Probably if you have a reasonable maximum width, quadrants are better. The best general-purpose algorithm is one from Larry Andrews.

+2
source

If you are looking for (1) the closest place, you do not need to sort. Just iterate over your list, calculating the distance to each point and tracking the nearest. When you go through the list, you will get an answer.

It would be even better to introduce the concept of grids. You must assign each point to the grid. Then, to search, first determine the grid you are in, and do the calculations at the grid points. However, you need to be careful. If the test location is close to the mesh border, you will also need to look for these grids. However, it is likely to be very productive.

+1
source

I did not consider this myself, but Postgres has a module designed to manage GIS data.

In the application that I worked on in a previous life, we took all the data, calculated its key for a quad-tree (for 2D space) or oct-tree (for 3D space) and saved it in the database. Then it was easy to load the values ​​from the database (so that you did not have to recalculate the quad-tree) and follow the standard search algorithm for four trees.

This, of course, means that you will touch all the data at least once to get them in the data structure. But preserving this data structure means that since then you can get better search speeds. I would suggest that you do a lot of nearest neighbor checks for each dataset.

(there is a good explanation for kd-tree wikipedia: http://en.wikipedia.org/wiki/Kd-tree )

+1
source

You need a spatial index. Fortunately, MySQL provides just such an index in Spatial Extensions . They use the internal R-Tree index, although it doesn't really matter what they use. There are many details on the manual page mentioned above.

+1
source

I think you could do it theoretically if you had a big enough table to do it ... secondly, maybe caching correctly can lead to a very good average case?

0
source

The R-Tree index can be used to speed up spatial queries like this. Once created, it allows such searches better than O (n).

0
source

All Articles