Mapreduce for Dummies

Ok, I'm trying to find out Hadoop and mapreduce. I really want to start with mapreduce, and I find many, many simplified examples of maps and reducers, etc. However, I saw something missing. Although an example showing how many occurrences of a word is in a document is simple to understand, this does not help me solve any problems of the โ€œreal worldโ€. Does anyone know a good tutorial on implementing mapreduce in a psuedo-realistic situation. Say, for example, I want to use hasoop and mapreduce on top of a data warehouse like Adventureworks. Now I want to receive orders for this product in the month of May. What would it look like in terms of a chaop / mapreduce? (I understand that this may not be the type of mapreduce problem designed to solve, but it just occurred to me quickly.)

Any direction will help.

+8
mapreduce hadoop
source share
3 answers

Hadoop: The Definitive Guide is a good place to start. The introductory chapters should be really helpful for you to figure out where MapReduce is useful and when you should use it. More advanced chapters have many more realistic examples than word count.

If you want to dive deeper, you can check the intensive data processing with MapReduce . This definitely has many โ€œrealโ€ use cases, but it seems like you're not interested in text processing.


In your specific example, the main things to implement:

  • The map phase is primarily intended for parsing, data conversion, and data filtering. Think about recording by recordings shared for recording. In word counting, this is line parsing and word splitting.
  • The reduction phase is a combination: counting, averaging, min / max, etc. In word counting, this is counting instances of a word.

So, if you want all records for this product in the month of May, you can use the display only task to filter all data and store only the necessary records. However, you should really read that Hadoop is useful. The question that Hadoop is best suited for will be: give me a count of how many times each item was bought in each month (maybe build a matrix). Very rarely are you looking for specific entries that you offer.

If you are looking for a more affordable platform in real time, you should check out HBase as soon as you finish exploring Hadoop.

+13
source share

Hadoop can be used for a variety of tasks. Check out this atbrox blog post . In addition, the Internet has a lot of information about Hadoop and MapReduce, and itโ€™s easy to get lost. So here is a summary list of resources on Hadoop.

BTW, Hadoop - The final leadership of the 3rd edition is due in May. Looks like it also covers MRv2 (NextGen MapReduce), and also includes more case studies. The second edition is worth mentioning Orangeoctopus.

+4
source share

MapReduce could be a complex topic, so it was easier for me to understand it by applying its approach to a simple problem. Then I continue to describe how MapReduce makes it easier to solve the same problem in the cluster. You can look in my article here: Introduction to parallel processing using MapReduce .

Let me know if you think this article makes understanding MapReduce and Hadoop easier.

0
source share

All Articles