Multiprocessing python. Pool stuck after long execution

I am developing a tool that analyzes huge files. To make it faster, I introduced multiprocessing on it, and everything seems to be working fine. For this, I use multiprocessing.pool, creating N threads, and they handle various pieces of work that I previously created.

pool = Pool(processes=params.nthreads) for chunk in chunk_list: pool.apply_async(__parallel_quant, [filelist, chunk, outfilename]) pool.close() pool.join() 

As you can see, this is standard pool execution without special use.

Recently, I find a problem when I run a really large amount of data. Standard runs take about 2 hours with 16 threads, but I have a special case that takes about 8 hours due to its really large number of files and their size.

The problem is that I recently found that when I execute this case, the execution is fine until the end, most of the children ends properly, except that it gets stuck in

 <built-in method recv of _multiprocessing.Connection object at remote 0x3698db0> 

Since this child does not finish parenting, he does not wake up and execution stops.

This situation only occurs when the input files are very large, so I was wondering if there was any default timeout that might cause this problem.

I am using python 2.7 multiprocessing 0.70a1

and my car is centos 7 (32 cores, 64 GB of RAM)

Thank you in advance for your help.

Jordi

+2
source share
1 answer

From multiprocessor programming instructions:

Avoid General Condition

 As far as possible one should try to avoid shifting large amounts of data between processes. 

If you need to split file processing using several processes, it’s best to instruct them on how to extract fragments of files, rather than sending pieces.

Try passing the piece offset and block size to the child process. It can extract a piece from a file using open () and seek (). You will notice better performance and less memory.

+1
source

All Articles