Multiprocessor Queue - Why is memory consumption increasing?

The following script generates 100 random dictionaries of size 100,000, transfers each (key, value) tuple to the queue, and one separate process reads from the queue:

import multiprocessing as mp

import numpy.random as nr


def get_random_dict(_dummy):
    return dict((k, v) for k, v in enumerate(nr.randint(pow(10, 9), pow(10, 10), pow(10, 5))))

def consumer(q):
    for (k, v) in iter(q.get, 'STOP'):
        pass

q = mp.Queue()
p = mp.Process(target=consumer, args=(q,))
p.start()
for d in mp.Pool(1).imap_unordered(get_random_dict, xrange(100)):
    for k, v in d.iteritems():
        q.put((k, v))
q.put('STOP')
p.join()

I expected that memory usage would be permanent because the consumer process is pulling data out of the queue as the main process passes it. I checked that the data does not accumulate in the queue.

However, I tracked memory consumption and continues to grow as the script runs. If replaced imap_unorderedby for _ in xrange(100): d = get_random_dict(), then the memory consumption will be constant. What is the explanation?

+4
2

Pool.imap imap. , imap . . , , , , , , . , , multiprocessing. , itertools.imap .

, , , , . , , , ( ), . Queue - . , , , , ( , ).

+1

, multiprocessing.Pool , ( Pool), . ( ), Pool , , , , .

, :

...
def get_random_dict(_dummy):
    print 'generating dict'
    ...
...
for d in mp.Pool(1).imap_unordered(get_random_dict, xrange(100)):
    print 'next d'
    ...

- :

generating dict
generating dict
next d
generating dict
generating dict
generating dict
generating dict
generating dict
next d
...

, dict, - (, Pool).

, get_random_dict *map Pool.

+1

All Articles