I'm trying to understand the basics of MapReduce in MongoDB and even after implementing it, I'm not sure how it differs from SQL GROUP BY or even Mongo's own GROUP BY. In a SQL server, GROUP BY can be executed by a thread or a hash aggregate. Is MapReduce like a hash aggregate, just on a lot of servers?
I read in places where MR for MongoDB should run as a background process, as this is a "heavy operation". Given that the data is plastered, won't GROUP BY be equally "heavy"? However, I am only trying to compare the types of operations that can be implemented both as an MR job and using a GROUP BY query.
Is there something that GROUP BY cannot do, and only MR can do?
Also, Hadoop seems to be very good at MR (this is just what I read. I have never worked on Hadoop). How is the Hadoop MR different from the Mongo model?
I'm confused. Please help or advise me on a good tutorial that explains the need for MapReduce.
Aafreen sheikh
source share