Given a set of geolocations, how can I find the most popular places?

Given a set of geolocations (as latitude / longitude), how do I find the most popular location?

As an example, I put together a map containing many points:

Map Link

We see that - except for 1, 4, 6 and 9 - all points are grouped in about the same place. How could I calculate the average location of this group? Ideally, as the map becomes more populated, I would like to calculate the 2nd most popular location, 3rd place in popularity, etc.

How could I solve this?

Thanks in advance.

+4
source share
2 answers

If you need a simple solution that will give you a good grade and easily coded ...

  • Choose a radius for what you consider the size of the place (say 100 m)
  • Iterating around your places makes them the center of your circle.
  • Count how many other points are inside the circle using Point-in-Circle
  • The place with the most points inside the circle becomes the most popular, now remove all these points from your set and repeat for the 2nd most popular for nth most popular.

How to convert distance to latitude and longitude? convert latitude longitude to meters

For example, 1 degree latitude between 110574 and 111693 meters.

+3
source

the DBSCAN algorithm is probably what you are looking for.

This is an algorithm that finds clusters of points based on their density. Since in your case, popularity means density, you can use it to solve this problem. Two parameters are required for this:

  • eps , the radius around each point where you need to look for other points to form a cluster,
  • minPts , the required number of points in the radius around the point to consider a group of nodes as a cluster.

You also need a function to measure the distance between points. Since you have (latitude, longitude) pairs (usually in WGS84 format), Haversize distance is what you need.

There are several options for implementing the algorithm. If you use Java, Apache Commons Math provides a decent implementation (see here for more details and a code snippet). Call DBSCANClusterer with eps = 1.0 (radius 1 km) and minPts = 0 (clusters have 1 or more points). See this answer for a Haversine distance implementation (make sure it matches the same unit as eps). Finally, sort the clusters by reducing the size to sort them by "popularity":

 Collections.sort(clusters, (o1, o2) -> Integer.compare(o2.getSize(), o1.getSize()); Cluster<? extends Clusterable> mostPopular = clusters.get(0); 

If I remember correctly, this implementation solves the problem in quadratic time relative to its size (the number of points). If all instances of the problem you encounter are the same size as your example, you will have no problem.

+2
source

All Articles