How to restart and update documents using PyMongo?

I have a simple, single-client setup for MongoDB and PyMongo 2.6.3. The goal is to iterate over each document in the collection and update ( save ) each document in the process. The approach I'm using looks something like this:

 cursor = collection.find({}) index = 0 count = cursor.count() while index != count: doc = cursor[index] print 'updating doc ' + doc['name'] # modify doc .. collection.save(doc) index += 1 cursor.close() 

The problem is that save seems to reorder the documents in the cursor. For example, if my collection consists of 3 documents ( id omitted for clarity):

 { "name": "one" } { "name": "two" } { "name": "three" } 

above program outputs:

 > updating doc one > updating doc two > updating doc two 

If, however, the collection.save(doc) is deleted, the output becomes:

 > updating doc one > updating doc two > updating doc three 

Why is this happening? What is the right way to safely iterate and update documents in a collection?

+7
python mongodb pymongo
source share
3 answers

Found the answer in the MongoDB documentation:

Since the cursor is not isolated during its lifetime, intermediate write operations in the document may cause the cursor to return the document more than once if this document has changed. To deal with this situation, see Snapshot Mode Information.

Snapshot mode is enabled on the cursor and provides a good guarantee:

snapshot() bypasses the index in the _id field and ensures that the query returns every document (with respect to the value of the _id field) no more than once.

To enable snapshot mode using PyMongo:

 cursor = collection.find(spec={},snapshot=True) 

according to PyMongo find() documentation. Confirmed that this fixed my problem.

+11
source share

The snapshot does the job.

But on pymongo 2.9 onwards, the syntax is slightly different.

 cursor = collection.find(modifiers={"$snapshot": True}) 

or for any version,

 cursor = collection.find({"$snapshot": True}) 

according to PyMongo documentation

+5
source share

I could not recreate your situation, but maybe from my head, because, getting the results, how you do it, getting them one by one from db, you actually create more when you go (saving and then fetching the next one).

You can try to save the result in a list (this way, you get all the results at once - it can be difficult , depending on your request):

 cursor = collection.find({}) # index = 0 results = [res for res in cursor] #count = cursor.count() cursor.close() for res in results: # while index != count //This will iterate the list without you needed to keep a counter: # doc = cursor[index] // No need for this since 'res' holds the current record in the loop cycle print 'updating doc ' + res['name'] # print 'updating doc ' + doc['name'] # modify doc .. collection.save(res) # index += 1 // Again, no need for counter 

Hope this helps

+1
source share

All Articles