Real-Time Aggregation Strategies in MongoDB

When learning real-time analysis methods using MongoDB, there seems to be a pretty standard way to do the sums, but nothing happens in terms of more complex aggregation. Some things that helped ...

The basic approach for performing amounts is to atomically increase the document keys for each new entry that is included in the system to cache common queries:

Stats.collection.update({"keys" => ["a", "b", "c"]}, {"$inc" => {"counter_1" => 1, "counter_2" => 1"}, "upsert" => true); 

This does not work for aggregates other than sums. My question is: can something like this do for medium , min and max in mongodb?

Say you have a document like this:

 { :date => "04/27/2011", :page_views => 1000, :user_birthdays => ["12/10/1980", "6/22/1971", ...] # 1000 total } 

Could you do some atomic or optimized / operative operation that grouped birthdays into something like that?

 { :date => "04/27/2011", :page_views => 1000, :user_birthdays => ["12/10/1980", "6/22/1971", ...], # 1000 total :average_age => 27.8, :age_rank => { "0 to 20" => 180, "20 to 30" => 720, "30 to 40" => 100, "40 to 50" => 0 } } 

... just like you can do Doc.collection.update({x => 1}, {"$push" => {"user_birthdays" => "12/10/1980"}}) to add that something into an array, and not load the document, can you do something similar to average / aggregate the array? Is there anything in this direction that you use for real-time aggregation?

MapReduce is used for batch processing jobs, I am looking for templates for something like real-time conversion for:

  • Average values : every time you click a new element on an array in mongodb, what is the best way to average these values ​​in real time?
  • Grouping : if you group the age for 10 year brackets and you have an array of ages, how could you optimally update the score for each group as you update the document with a new age? let's say an array of age will be constantly pushed / pulled out.
  • Min / Max : what are the ways to calculate and save the minimum / maximum array of these ages in this document?
+7
source share
2 answers

Could you do some atomic or optimized / operative operation that grouped birthdays into something like that?

Looks like you added two age_rank fields, average_age . These are efficiently calculated fields based on the data that you already have. If I gave you a document with page views and user birthdays, it should be really trivial for client code to find min / max, average, etc.

It seems to me that you are asking MongoDB to perform server-side aggregation for you. But are you adding a constraint that you do not want to use Map / Reduce?

If I understand your question correctly, are you looking for something where you can say: "Add this element to the array and update all dependent elements"? You do not want readers to follow any logic, you want everything to be β€œmagical” on the server side.

So, there are three different ways to solve this problem, but only one of them is currently available:

  • Write this logical client side. This is not like the solution you want, but it will work. If you have basic data, running max / min / med / avg should be pretty trivial in most languages.
  • Use the following functions for Aggregation . They are not planned until 1.9.x. Improved aggregation will allow you to retrieve the data you are looking for, however you still have to write the appropriate queries. The underlying database still does not contain the data you are looking for.
  • You need triggers . If you really want the database to always be consistent and contain generalized data, then this is what you need. However, the trigger function does not yet exist.

Unfortunately, your only option now is # 1. Fortunately, I know several people who successfully use option # 1.

+4
source

Work is planned on the upcoming release 1.9.x, which may have units.

See: https://jira.mongodb.org/browse/SERVER-447

Of course, he can get bumepd for a later release /

0
source

All Articles