How to save memory when using Multiprocessing in Python?

I have a function that takes the node identifier of the graph as input and calculates something on the graph (without changing the graph object), then it saves the results in the file system, my code looks like this:

... # graph file is being loaded g = loadGraph(gfile='data/graph.txt') # list of nodeids is being loaded nodeids = loadSeeds(sfile='data/seeds.txt') import multiprocessing as mp # parallel part of the code print ("entering the parallel part ..") num_workers = mp.cpu_count() # 4 on my machine p = mp.Pool(num_workers) # _myParallelFunction(nodeid) {calculate something for nodeid in g and save it into a file} p.map(_myParallelFunction, nodeids) p.close() ... 

The problem is that when I load a graph in Python, it takes up a lot of memory (about 2G, this is a big graph with thousands of nodes actually), but when it starts to go to the parallel part of the code (parallel map function execution) It seems that everyone a separate copy of g is provided to the process, and I just ran out of memory on my machine (she received 6G ram and 3G swap), so I wanted to see that there was a way to give each process the same copy of g, so that only memory was needed to store one copies? Any suggestions are welcome and in advance in advance.

+7
python multiprocessing python-multiprocessing
source share
2 answers

If dividing the graph into smaller parts does not work, you can find a solution using this or multiprocessing.sharedctypes , depending on which object your graph represents.

+1
source share

Your comment indicates that you are processing one node at a time:

 # _myParallelFunction(nodeid) {calculate something for nodeid in g and save it into a file} 

I would create a generator function that returns one node file from the graph file each time and passes that generator to the p.map() function instead of the entire list of nodeids .

+1
source share

All Articles