the DBSCAN algorithm is probably what you are looking for.
This is an algorithm that finds clusters of points based on their density. Since in your case, popularity means density, you can use it to solve this problem. Two parameters are required for this:
- eps , the radius around each point where you need to look for other points to form a cluster,
- minPts , the required number of points in the radius around the point to consider a group of nodes as a cluster.
You also need a function to measure the distance between points. Since you have (latitude, longitude) pairs (usually in WGS84 format), Haversize distance is what you need.
There are several options for implementing the algorithm. If you use Java, Apache Commons Math provides a decent implementation (see here for more details and a code snippet). Call DBSCANClusterer with eps = 1.0 (radius 1 km) and minPts = 0 (clusters have 1 or more points). See this answer for a Haversine distance implementation (make sure it matches the same unit as eps). Finally, sort the clusters by reducing the size to sort them by "popularity":
Collections.sort(clusters, (o1, o2) -> Integer.compare(o2.getSize(), o1.getSize()); Cluster<? extends Clusterable> mostPopular = clusters.get(0);
If I remember correctly, this implementation solves the problem in quadratic time relative to its size (the number of points). If all instances of the problem you encounter are the same size as your example, you will have no problem.
source share