MongoDB Aggregation: calculate current totals from the sum of previous rows

Examples of documents:

{ _id: ObjectId('4f442120eb03305789000000'), time: ISODate("2013-10-10T20:55:36Z"), value:1 }, { _id: ObjectId('4f442120eb03305789000001'), time: ISODate("2013-10-10T28:43:16Z"), value:2 }, { _id: ObjectId('4f442120eb03305789000002'), time: ISODate("2013-10-11T27:12:66Z"), value:3 }, { _id: ObjectId('4f442120eb03305789000003'), time: ISODate("2013-10-11T10:15:38Z"), value:4 }, { _id: ObjectId('4f442120eb03305789000004'), time: ISODate("2013-10-12T26:15:38Z"), value:5 } 

It’s easy to get aggregated results, grouped by date. But I want to request results that return the total amount from aggregation, for example:

 { time: "2013-10-10" total: 3, runningTotal: 3 }, { time: "2013-10-11" total: 7, runningTotal: 10 }, { time: "2013-10-12" total: 5, runningTotal: 15 } 

Is this possible with MongoDB aggregation?

+9
source share
3 answers

It does what you need. I normalized the time in the data so that it is grouped (you can do something like this ). The idea is to have $group and put time and total in separate arrays. Then the $unwind time array, and you made a copy of the totals array for each time document. Then you can calculate runningTotal (or something like a moving average) from an array containing all the data for different times. The "index" generated by $unwind is the array index for total corresponding to this time . $sort to $unwind is important, as this ensures that arrays are in the correct order.

 db.temp.aggregate( [ { '$group': { '_id': '$time', 'total': { '$sum': '$value' } } }, { '$sort': { '_id': 1 } }, { '$group': { '_id': 0, 'time': { '$push': '$_id' }, 'totals': { '$push': '$total' } } }, { '$unwind': { 'path' : '$time', 'includeArrayIndex' : 'index' } }, { '$project': { '_id': 0, 'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } }, 'total': { '$arrayElemAt': [ '$totals', '$index' ] }, 'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } }, } }, ] ); 

I used something similar in a collection with ~ 80,000 documents, which allowed me to get 63 results. I'm not sure how well it will work in large collections, but I found that performing transformations (projections, manipulating arrays) for aggregated data does not seem to result in significant performance losses when the data is reduced to a manageable size.

+6
source

here is another approach

the pipeline

 db.col.aggregate([ {$group : { _id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}}, value : {$sum : "$value"} }}, {$addFields : {_id : "$_id.time"}}, {$sort : {_id : 1}}, {$group : {_id : null, data : {$push : "$$ROOT"}}}, {$addFields : {data : { $reduce : { input : "$data", initialValue : {total : 0, d : []}, in : { total : {$sum : ["$$this.value", "$$value.total"]}, d : {$concatArrays : [ "$$value.d", [{ _id : "$$this._id", value : "$$this.value", runningTotal : {$sum : ["$$value.total", "$$this.value"]} }] ]} } } }}}, {$unwind : "$data.d"}, {$replaceRoot : {newRoot : "$data.d"}} ]).pretty() 

collection

 > db.col.find() { "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 } { "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 } { "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 } { "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 } { "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 } 

result

 { "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 } { "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 } { "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 } > 
+2
source

Here is a solution without adding previous documents to the new array and then processing them. (If the array gets too large, you can exceed the maximum BSON document size of 16 MB.)

Calculation of subtotals is also simple:

 db.collection1.aggregate( [ { $lookup: { from: 'collection1', let: { date_to: '$time' }, pipeline: [ { $match: { $expr: { $lt: [ '$time', '$$date_to' ] } } }, { $group: { _id: null, summary: { $sum: '$value' } } } ], as: 'sum_prev_days' } }, { $addFields: { sum_prev_days: { $arrayElemAt: [ '$sum_prev_days', 0 ] } } }, { $addFields: { running_total: { $sum: [ '$value', '$sum_prev_days.summary' ] } } }, { $project: { sum_prev_days: 0 } } ] ) 

What we did: in the search, we selected all documents with less time and date and immediately calculated the amount (using $ group as the second step of the search pipeline). $ Lookup places the value in the first element of the array. We pull the first element of the array and then calculate the sum: the current value + the sum of the previous values.

If you want to group transactions by day and after calculating subtotals, we need to insert $ group at the beginning, and also insert it into the $ lookup pipeline.

 db.collection1.aggregate( [ { $group: { _id: { $substrBytes: ['$time', 0, 10] }, value: { $sum: '$value' } } }, { $lookup: { from: 'collection1', let: { date_to: '$_id' }, pipeline: [ { $group: { _id: { $substrBytes: ['$time', 0, 10] }, value: { $sum: '$value' } } }, { $match: { $expr: { $lt: [ '$_id', '$$date_to' ] } } }, { $group: { _id: null, summary: { $sum: '$value' } } } ], as: 'sum_prev_days' } }, { $addFields: { sum_prev_days: { $arrayElemAt: [ '$sum_prev_days', 0 ] } } }, { $addFields: { running_total: { $sum: [ '$value', '$sum_prev_days.summary' ] } } }, { $project: { sum_prev_days: 0 } } ] ) 

Result:

 { "_id" : "2013-10-10", "value" : 3, "running_total" : 3 } { "_id" : "2013-10-11", "value" : 7, "running_total" : 10 } { "_id" : "2013-10-12", "value" : 5, "running_total" : 15 } 
+1
source

All Articles