MongoDB MapReduce: global variables in an instance of a map function?

I wrote MapReduce in MongoDB and would like to use a global variable as a cache for writing to / read from. I know that it is impossible to have global variables in instances of a map function - I just need a global variable in each instance of the function. This type of functionality exists in Hadoop MapReduce, so I expected it to be there in MongoDB. But the following does not seem to work:

var cache = {}; // Does not seem to work! function () { var hashValue = this.varValue1 + this.varValue2; if(typeof(cache[hashValue])!= 'undefined') { // Do nothing, we've processed at least one input record with this hash } else { // Process the input record // Cache the record cache[hashValue] = '1'; } } 

Is this invalid in the MongoDB MapReduce implementation, or am I doing something wrong in JavaScript (not tested in JS)?

+7
mongodb mapreduce
source share
2 answers

Looking at the docs , I find the following:

 db.runCommand( { mapreduce : <collection>, map : <mapfunction>, reduce : <reducefunction> [, scope : <object where fields go into javascript global scope >] } ); 

I think the "scope" variable is what you need.

There is a test case on Github that uses the "scope" variable.

I'm still new to this, but hopefully that's enough for you to get started.

+5
source share

As Gates vice president said, you need to add cache to the global area. So, to provide a complete answer, given your script, this is what you will need to do:

 db.runCommand( { mapreduce : <your collection>, map : <your map function, or reference to it>, reduce : <your reduce function, or reference to it>, scope : { cache : {} } } ); 

The command will inject the contents of the "scope" object parameter into your global context. Then caching will work the way you use it in your map function. I tested this.

+1
source share

All Articles