I have a list of image paths that I want to split between processes or threads so that each process processes part of the list. Processing includes loading an image from disk, performing some calculations, and returning the result. I am using Python 2.7 multiprocessing.Pool
This is how I create workflows
def ProcessParallel(classifier,path): files=glob.glob(path+"\*.png") files_sorted=sorted(files,key=lambda file_name:int(file_name.split('--')[1])) p = multiprocessing.Pool(processes=4,initializer=Initializer,initargs=(classifier,)) data=p.map(LoadAndClassify, files_sorted) return data
The problem that I encountered when I register the initialization time in my Intializer function, I found out that Workers are not initialized in parallel, and each worker is initialized with an interval of 5 seconds. Here are the logs for reference.
2016-08-08 12:38:32,043 - custom_logging - INFO - Worker started 2016-08-08 12:38:37,647 - custom_logging - INFO - Worker started 2016-08-08 12:38:43,187 - custom_logging - INFO - Worker started 2016-08-08 12:38:48,634 - custom_logging - INFO - Worker started
I tried using multiprocessing.pool.ThreadPool instead, which launches Workers at the same time.
I know how multiprocessing in Windows works, and we need to place a main guard to protect our code from spawning endless processes. The problem in my case is that I hosted my script in IIS using FASTCGI, and my script is not the main one, it is executed by the FastCGI process (there is a wfastcgi.py script that is responsible for this). Now wfastcgi.py has a main protector, and the logs show that I am not creating an infinite number of processes.
Now I want to know that this is the reason the multiprocessor pool does not create workflows at the same time, I really appreciate any help.
EDIT 1: Here is my initializer function
def Initializer(classifier): global indexing_classifier logger.info('Worker started') indexing_classifier=classifier