The following script generates 100 random dictionaries of size 100,000, transfers each (key, value) tuple to the queue, and one separate process reads from the queue:
import multiprocessing as mp
import numpy.random as nr
def get_random_dict(_dummy):
return dict((k, v) for k, v in enumerate(nr.randint(pow(10, 9), pow(10, 10), pow(10, 5))))
def consumer(q):
for (k, v) in iter(q.get, 'STOP'):
pass
q = mp.Queue()
p = mp.Process(target=consumer, args=(q,))
p.start()
for d in mp.Pool(1).imap_unordered(get_random_dict, xrange(100)):
for k, v in d.iteritems():
q.put((k, v))
q.put('STOP')
p.join()
I expected that memory usage would be permanent because the consumer process is pulling data out of the queue as the main process passes it. I checked that the data does not accumulate in the queue.
However, I tracked memory consumption and continues to grow as the script runs. If replaced imap_unorderedby for _ in xrange(100): d = get_random_dict(), then the memory consumption will be constant. What is the explanation?