MongoDB: Cluster documents by geographic location of a given area and maximum points?

I am trying to develop a map-based visualization that includes a “heat map” of subpopulations based on the MongoDB collection, which contains these documents:

{ "PlaceName" : "Boston", "Location" : { "type" : "Point", "coordinates" : [ 42.358056, -71.063611 ] }, "Subpopulations": { "Age": { "0_4" : 37122, "6_11" : 33167, "12_17" : 35464, "18_24" : 130885, "25_34" : 127058, "34_44" : 79092, "45_54" : 72076, "55_64" : 59766, "65_74" : 33997, "75_84" : 20219, "85_" : 9057 } } } 

The database contains hundreds of thousands of individual locations. They do not intersect - that is, there will not be two separate entries for New York and Manhattan.

The goal is to use Leaflet.js and some plugins to render various visualizations of this data. The flyer is good enough for clustering data on the client side - therefore, if I gave her a thousand locations with density values, she could display a heat map of the corresponding area by simply folding all the individual values.

The problem is that I zoom out to show the whole world. It would be terribly inefficient, if not impossible, to send all this data to the client and quickly process this information to make a smooth visualization.

So what I need to do is automatically cluster the data servers, which I hope can be done in the MongoDB query. I read that geohash can be a good starting point for determining which points belong to clusters, but I'm sure someone did it exactly before and can have a better understanding than just that. Ideally, I would like to send a request to my node.js script, which looks like this:

 http://myserver.com/popdata?top=42.48&left=-80.57&bottom=37.42&right=-62.55&stat=Age&value=6_11 

which will determine how granular clustering should be based on how many individual points are within the specified geographical area, given the maximum number of data points returned or something in these rows; and it will return the data as follows:

 [ { "clusterlocation": [ 42.304, -72.622 ], "total_age_6_11": 59042 }, { "clusterlocation": [ 36.255, -64.124 ], "total_age_6_11": 7941 }, { "clusterlocation": [ 40.425, -70.693 ], "total_age_6_11": 90257 }, { "clusterlocation": [ 39.773, -67.992 ], "total_age_6_11": 102752 }, ... ] 

... where "clusterlocation" is something like the average for all document locations in the cluster, and "total_age_6_11" is the sum of the values ​​of these documents for "Subpopulations.Age.6_11".

Is this something I can do exclusively in a Mongo request? Is there a “tried and tested” way to do this well?

+6
source share
1 answer

Even if you execute this request at runtime, it will be inefficient and not fast to be considered a good user interface. I would suggest you create clusters of a certain size and save them in your current collection along with your original documents. Here's how:

  • Each document will save an additional field (let's call it geolevel), which will designate as a small or large object. Your base documents will have geolevel = 1:

     { "PlaceName" : "Boston", "Location" : { "type" : "Point", "coordinates" : [ 42.358056, -71.063611 ] }, "Subpopulations": { "Age": { "0_4" : 37122, "6_11" : 33167, "12_17" : 35464, "18_24" : 130885, "25_34" : 127058, "34_44" : 79092, "45_54" : 72076, "55_64" : 59766, "65_74" : 33997, "75_84" : 20219, "85_" : 9057 } }, "geolevel":1 // added geolevel } 
    • You can start processing in your database to pre-generate similar documents for clusters and for several levels. for example, geolevel: 2 will be a cluster of several cities within a radius of 250 km, geolevel: 3 will be a cluster of geo-level: 2 clusters.

    • You can also save a memberids field for storing child identifiers in each cluster. This may be necessary to avoid an entity that is part of two neighboring clusters, it can be assigned to any of the neighboring clusters, and your visualization will still work fine. Geo-level: 2 cluster document would look as follows:

        { "PlaceName" : "cluster_sdfs34535", // The id can be generated from hash like sha of a list of all children ids. "Location" : { // center of the cluster "type" : "Point", "coordinates" : [ 42.358056, -71.063611 ] }, "Subpopulations": { // total population of the cluster "Age": { "0_4" : 371220, "6_11" : 331670, "12_17" : 354640, "18_24" : 1308850, "25_34" : 1270580, "34_44" : 790920, "45_54" : 720760, "55_64" : 597660, "65_74" : 339970, "75_84" : 202190, "85_" : 90570 } }, "geolevel":2 , "childs":[4,5,6,7] // ids of child documents } 
    • Now your visualization application should perform a scale comparison with a geo-level, and based on this you will select your documents. For visualization at the city level, you can request a geo-level: 1 document, and as you zoom out to state, country, etc., you can increase the geo-pole to 2.3 ...
+4
source

All Articles