New for Mongo / Pimongo. Currently using the latest version - v3.2.2
Does insert_many seem to not work as intended? I noticed that even when delivering the generator to db.col.insert_many, memory usage is still spike (making it difficult to insert millions of documents, although I understand that sys.mem should be> collection size for better performance, so it’s actually possible Is that nothing I have to worry about?
I got the impression that if you pass the generator to insert_many, will pymongo "buffer" the insert into 16 or 32 mb of "pieces"?
Performing this buffering / fragmentation manually solves the problem ...
See below:
Example1 = direct insert_many (large memory usage - 2.625 GB)
Example2 = 'buffered' insert_many (expected [low] memory - ~ 300 MB)
import itertools
from itertools import chain,islice
import pymongo
client = pymongo.MongoClient()
db=client['test']
def generate_kv(N):
for i in range(N):
yield {'x': i}
print "example 1"
db.testcol.drop()
db.testcol.insert_many(generate_kv(5000000))
def chunks(iterable, size=10000):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
print "example 2"
db.testcol.drop()
for c in chunks(generate_kv(5000000)):
db.testcol.insert_many(c)
Any ideas? Error? Am I using this incorrectly?
Bryan source
share