I have a huge DynamoDB table that I want to analyze in order to aggregate the data that is stored in its attributes. Then the aggregated data is processed by the Java application. Although I understand the really basic concepts of MapReduce, I have never used them before.
In my case, let's say that I have the customerId and orderNumbers in each DynamoDB element, and that I can have more than one element for the same client. How:
customerId: 1, orderNumbers: 2 customerId: 1, orderNumbers: 6 customerId: 2, orderNumbers: -1
Basically, I want to summarize orderNumbers for each customerId, and then do some Java operations using an aggregate.
AWS Elastic MapReduce can probably help me, but I donβt understand how to connect a custom JAR with DynamoDB. My custom JAR should probably expose map and reduce functions, where can I find the right interface to implement?
Plus I'm a bit confused by the docs, it seems to me that I must first export my data to S3 before launching my JAR. Is it correct?
thanks
Mark
source share