How to share cache?

I use caches here to speed up some pretty complicated processing. Lru_cache works well and significantly speeds up work. However, when I multiprocess each process creates its own separate cache, and there are 8 copies of the same thing. This does not seem to be a problem until the window runs out of memory and bad things happen as a result. Ideally, I need to cache about 300 for this application, but 8 * 300 just does not fit into the 7GB that I have to work with, whereas I think that 1 * 300 will be.

Is there a way for all processes to use the same cache?

+4
source share
2 answers

I believe that you can use Manager to split the dict between processes. This theoretically allows you to use the same cache for all functions.

However, I think that a more reasonable logic would be a single process that responds to requests by looking at them in the cache, and if they are not present, then delegating the work to the subprocess and caching the result before returning it. You can easily do this with

 with concurrent.futures.ProcessPoolExecutor() as e: @functools.lru_cache def work(*args, **kwargs): return e.submit(slow_work, *args, **kwargs) 

Note that work will return Future objects that the consumer will have to wait. lru_cache will cache future objects so that they automatically return; I believe that you can access your data several times, but you cannot check it right now.

If you are not using Python 3, you will need to install the backported versions of concurrent.futures and functools.lru_cache .

+4
source

Doh. Katri set me on the right path, and I would follow this answer, but, stupid, is actually simpler than this [for this application]:

The parent process can create objects (from the central cache) and pass the obj instance to the pool process as an argument ...

 @utils.lru_cache(maxsize=300) def get_stuff(key): return Stuff(key) def iterate_stuff(keys): for key in keys: yield get_stuff(key) # doh, rather than just yielding the key, yield the cached obj def process(stuff_obj): # get_stuff(key) <-- do this in the parent, rather than the child. stuff_obj.execute() def main(): ... keys = get_list_of_keys() for result in pool.imap(process, iterate_stuff(keys)): evaluate(result) ... 
0
source

All Articles