I am trying to develop a map-based visualization that includes a “heat map” of subpopulations based on the MongoDB collection, which contains these documents:
{ "PlaceName" : "Boston", "Location" : { "type" : "Point", "coordinates" : [ 42.358056, -71.063611 ] }, "Subpopulations": { "Age": { "0_4" : 37122, "6_11" : 33167, "12_17" : 35464, "18_24" : 130885, "25_34" : 127058, "34_44" : 79092, "45_54" : 72076, "55_64" : 59766, "65_74" : 33997, "75_84" : 20219, "85_" : 9057 } } }
The database contains hundreds of thousands of individual locations. They do not intersect - that is, there will not be two separate entries for New York and Manhattan.
The goal is to use Leaflet.js and some plugins to render various visualizations of this data. The flyer is good enough for clustering data on the client side - therefore, if I gave her a thousand locations with density values, she could display a heat map of the corresponding area by simply folding all the individual values.
The problem is that I zoom out to show the whole world. It would be terribly inefficient, if not impossible, to send all this data to the client and quickly process this information to make a smooth visualization.
So what I need to do is automatically cluster the data servers, which I hope can be done in the MongoDB query. I read that geohash can be a good starting point for determining which points belong to clusters, but I'm sure someone did it exactly before and can have a better understanding than just that. Ideally, I would like to send a request to my node.js script, which looks like this:
http://myserver.com/popdata?top=42.48&left=-80.57&bottom=37.42&right=-62.55&stat=Age&value=6_11
which will determine how granular clustering should be based on how many individual points are within the specified geographical area, given the maximum number of data points returned or something in these rows; and it will return the data as follows:
[ { "clusterlocation": [ 42.304, -72.622 ], "total_age_6_11": 59042 }, { "clusterlocation": [ 36.255, -64.124 ], "total_age_6_11": 7941 }, { "clusterlocation": [ 40.425, -70.693 ], "total_age_6_11": 90257 }, { "clusterlocation": [ 39.773, -67.992 ], "total_age_6_11": 102752 }, ... ]
... where "clusterlocation" is something like the average for all document locations in the cluster, and "total_age_6_11" is the sum of the values ​​of these documents for "Subpopulations.Age.6_11".
Is this something I can do exclusively in a Mongo request? Is there a “tried and tested” way to do this well?