Inspired by @Rafael Valero + fixing the last error in my code and giving it a more general character, I created a generator function to iterate over the Mongo collection with the query and projection:
def iterate_by_chunks(collection, chunksize=1, start_from=0, query={}, projection={}): chunks = range(start_from, collection.find(query).count(), int(chunksize)) num_chunks = len(chunks) for i in range(1,num_chunks+1): if i < num_chunks: yield collection.find(query, projection=projection)[chunks[i-1]:chunks[i]] else: yield collection.find(query, projection=projection)[chunks[i-1]:chunks.stop]
so for example, you first create an iterator like this:
mess_chunk_iter = iterate_by_chunks(db_local.conversation_messages, 200, 0, query={}, projection=projection)
and then repeat it in pieces:
chunk_n=0 total_docs=0 for docs in mess_chunk_iter: chunk_n=chunk_n+1 chunk_len = 0 for d in docs: chunk_len=chunk_len+1 total_docs=total_docs+1 print(f'chunk #: {chunk_n}, chunk_len: {chunk_len}') print("total docs iterated: ", total_docs) chunk
alexander ostrikov
source share