One of my friends was asked this question by hadoop MapReduce. We have several stores, and in each store there are many customers who visit and buy things. the data set consists of "Store #, Customer #, Quantity purchase". You need a MapReduce code to get the Top 2 customers for each store.
The solution I was thinking about is to do a secondary qty sorting (in descending order - store + qty makes a composite key), and only the first 2 values (or clients) for each key (store + qty, qty is part of the composite key). This works if the customer is unique, but if the customer visited the same store several times, then how to do it?
The solution is to loop through each value, add qty for each client, sort it by qty in the reducer. This would mean that I would do the sorting logic again and not sure if I could use TreeMap / Hashmap etc., since there might be memory limitations.
or the solution is to write 2 MapRed that run one after the other. The first is to get the qty amount purchased for each customer and store. The second MapRed sorts by qty and gets the top 2 buyers.
Any other way to achieve this? Also considering memory limitations?
source
share