Group coordinates in proximity to each other

I am creating a REST API, so the answer cannot include google or javascript maps. In our application, we have a table containing messages that look like this:

ID | latitude | longitude | other_sutff 1 | 50.4371243 | 5.9681102 | ... 2 | 50.3305477 | 6.9420498 | ... 3 | -33.4510148 | 149.5519662 | ... 

We have a view with a map that shows all the messages around the world. Hopefully we will have a lot of posts and it will be funny to show thousands and thousands of markers on the map. Therefore, we want to group them by proximity so that we can have something like 2-3 markers on the continent.

To be clear, we need: enter image description here Image from https://github.com/googlemaps/js-marker-clusterer

I did some research and found that k-tools seem to be part of the solution. Since I'm really poor at Math, I tried a couple of php libraries like this one: https://github.com/bdelespierre/php-kmeans , which seems to do a decent job, however there is a drawback: I have to parse the whole table every time the card is loading. Performance is terrible.

So, I would like to know if someone went through this problem or if there is a better solution.

+4
source share
1 answer

I continued searching and I found an alternative to KMeans: GEOHASH

Wikipedia will better explain to me what it is: Wiki geohash

But to summarize. The world map is divided into a grid of 32 cells and each is assigned an alphanumeric character. Each cell is also divided into 32 cells and so on into 12 levels. Therefore, if I do GROUP BY in the first letter of the hash, I will get my clusters for the lowest level of scaling, if I need higher precision, I just need to group the first N letters of my hash.

So, what I did is add only one field to my table and generate a hash corresponding to my coordinates:

 ID | latitude | longitude | geohash | other_sutff 1 | 50.4371243 | 5.9681102 | csyqm73ymkh2 | ... 2 | 50.3305477 | 6.9420498 | p24k1mmh98eu | ... 3 | -33.4510148 | 149.5519662 | 8x2s9674nd57 | ... 

Now, if I want to get my clusters, I just need to make a simple request:

 SELECT count(*) as nb_markers FROM mtable GROUP BY SUBSTRING(geohash,1,2); 

In substring 2 is the level of accuracy and should be between 1 and 12

PS: Lib I used to generate my hash

+4
source

All Articles