Counting unique users with Mapreduce for Java Appengine

I am trying to count the number of unique users per day in a java appengine application. I decided to use the mapreduce framework (mapreduce.appspot.com) for java appengine to do this calculation offline. I managed to create a map reduction job that goes through all of my objects that represent a single user session event. I can use a simple counter. I have a few questions:

1) How to increment the counter once for each user ID? I am currently matching objects that contain the property of the user ID, but many of these objects can contain the same user ID, so how can I read only once?

2) As soon as I get these job results stored in these counters, how can I transfer them to the data warehouse? I see the results of counters on the mapreduce status page, but I want these results to be automatically stored in the data store.

Ideas?

+6
java google-app-engine parallel-processing mapreduce
source share
1 answer

I haven't used MapReduce functionality yet, but my theoretical understanding is that you can write things to the data store from your cartographer. You can create an Entity type called UniqueCount and insert one object each time your cartographer sees an identifier that he has not previously seen. then you can calculate how many unique identifiers you have. In fact, you can simply update the counter every time you find a new unique object. You might want a “shaded counter” Google for tips on creating a counter in a data warehouse that can handle high throughput.

In the end, when they finish the Reduce function, I think that this whole task will become quite trivial.

+1
source share

All Articles