Hadoop: The Definitive Guide is a good place to start. The introductory chapters should be really helpful for you to figure out where MapReduce is useful and when you should use it. More advanced chapters have many more realistic examples than word count.
If you want to dive deeper, you can check the intensive data processing with MapReduce . This definitely has many โrealโ use cases, but it seems like you're not interested in text processing.
In your specific example, the main things to implement:
- The map phase is primarily intended for parsing, data conversion, and data filtering. Think about recording by recordings shared for recording. In word counting, this is line parsing and word splitting.
- The reduction phase is a combination: counting, averaging, min / max, etc. In word counting, this is counting instances of a word.
So, if you want all records for this product in the month of May, you can use the display only task to filter all data and store only the necessary records. However, you should really read that Hadoop is useful. The question that Hadoop is best suited for will be: give me a count of how many times each item was bought in each month (maybe build a matrix). Very rarely are you looking for specific entries that you offer.
If you are looking for a more affordable platform in real time, you should check out HBase as soon as you finish exploring Hadoop.
Donald miner
source share