How to write aggregation without exceeding the maximum document size?

I received the request exceeds maximum document size problem on request as follows:

 pipe = [ {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }} ] res =db.patients.aggregate(pipe,allowDiskUse=True) 

I fixed it by adding the $project operator,

However, what if the document is still over 16MB , do I even use $project ?

What can I do? Any ideas? Thanks you

 pipe = [ {"$project": {"birthday":1, "id":1} }, {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} } } ] res =db.patients.aggregate(pipe,allowDiskUse=True) 

An exception

 OperationFailure: command SON([('aggregate', 'patients'), ('pipeline', [{'$match': {'birthday': {'$gte': datetime.datetime(1987, 1, 1, 0, 0)}}}]), ('allowDiskUse', True)]) on namespace tw_insurance_security_development.$cmd failed: exception: aggregation result exceeds maximum document size (16MB) 
+7
mongodb pymongo
source share
3 answers

By default, the result of aggregation is returned to you in a single BSON document, in which size restriction occurs. If you need to return more than you can:

  • results are displayed in the collection. You do this by completing your pipeline with

    {"$ out": "some-collection-name"}

    Then you request this collection as usual (you will need to delete it yourself when you are done with it)

  • have results returned as a cursor, indicating useCursor=True when calling aggregate.

Both of these options require mongodb 2.6: if you are still using mongodb 2.4, then this is just the fundamental aggregation limit.

+24
source share

As @Frederick said, a minimum of mongo 2.6 is required. For further use, here is a link from the mongo documentation, which works similarly to runCommand way but with db.collection.aggreagate, note that use the "cursor" option to limit the document, use the "allowDiskUse" parameter to limit sorting.

+3
source share

Use the following snippet

 db.patients.runCommand('aggregate', {pipeline: [ {"$project": {"birthday":1, "id":1}}, {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }} ], allowDiskUse: true}) 

here allowDiskUse will help find more than 16 MB of data

-5
source share

All Articles