I am new to mongodb and map reduction and want to evaluate spatial data using spatial clustering of k-means. I found this article an article which seems to be a good description of the algorithm, but I don’t know how to translate this into a mongo script shell. Suppose my data looks like this:
{ _id: ObjectID(), loc: {x: <longitude>, y: <latitude>}, user: <userid> }
And I can use {k = sqrt (n / 2)}, where n is the number of samples. I can use aggregates to get bounding data extents and numbers, etc. I was kinda lost with a link to a cluster point file, which, I believe, will be just another collection, and I don’t know how to iterate, or if it will be done in the client or in the database?
Well, I’ve made some progress on this in that I created an array of massive random points that I need to calculate by the sum of the least squares against during the map reduction phase, but I don’t know how to get them to the display function. I recorded a map function record:
var mapCluster = function() { var key = -1; var sos = 0; var pos; for (var i=0; i<pts.length; i++) { var dx = pts[i][0] - this.arguments.pos[0]; var dy = pts[i][1] - this.arguments.pos[1]; var sumOfSquare = dx*dx + dy*dy; if (i == 0 || sumOfSquares < sos) { key = i; sos = sumOfSquares; pos = this.arguments.pos; } } emit(key, pos); };
In this case, the cluster points are similar, which probably won't work:
var pts = [ [x,y], [x1,y1], ... ];
So, for each iteration mr, we compare all the collection points with this array and emit the index of the point at which we are closest to the location of the collection point, and then in the decrease function the average value of the points associated with each index will be used to create a new location cluster points. Then in the finialize function, I can update the cluster document.
I suppose I could do findOne () in the cluster document to load the cluster points in the map functions, but do we want to load this document with every call to the map? or is there a way to load it once for each iteration?
So, it looks like you can do this with a scope variable like this:
db.main.mapReduce( mapCluster, mapReduce, { scope: { pnts: pnts, ... }} );
You must be careful with the variable names in the scope, as they are placed in the map scope, reduce and terminate the functions that they may encounter with existing variable names.