Very simple parallel programming in Python

I have a simple Python script that uses two much more complex Python scripts and does something with the results.

I have two modules, Foo and Bar, and my code is as follows:

import Foo import Bar output = [] a = Foo.get_something() b = Bar.get_something_else() output.append(a) output.append(b) 

Both methods require a lot of time to start, and neither of them depends on the other, so the parallel solution is the obvious solution. How can I achieve this, but make sure that the order is maintained: Depending on which one of them is the first to finish the job, it must wait for the other to complete before the script can continue .

Let me know if I have not clarified myself enough, I tried to make the example code as simple as possible.

+9
python concurrency
source share
2 answers

In general, you need to use threading for this.

First create a thread for each thing you want to run in parallel:

 import threading import Foo import Bar results = {} def get_a(): results['a'] = Foo.get_something() a_thread = threading.Thread(target=get_a) a_thread.start() def get_b(): results['b'] = Bar.get_something_else() b_thread = threading.Thread(target=get_b) b_thread.start() 

Then, so that both of them are finished, use .join() for both:

 a_thread.join() b_thread.join() 

at this point, your results will be in results['a'] and results['b'] , so if you want an ordered list:

 output = [results['a'], results['b']] 

Note. If both tasks inherently require heavy CPU usage, you might want to consider multiprocessing - because of the Python GIL, a given Python process will use only one processor core, while multiprocessing can distribute tasks to separate the cores. However, it has slightly higher overheads than threading , and therefore, if tasks are less intensive in the CPU, this may not be as efficient.

+22
source share
 import multiprocessing import Foo import Bar results = {} def get_a(): results['a'] = Foo.get_something() def get_b(): results['b'] = Bar.get_something_else() process_a = multiprocessing.Process(target=get_a) process_b = multiprocessing.Process(target=get_b) process_b.start() process_a.start() process_a.join process_b.join 

Here is the process version of your program.

NOTE. There are common data structures in streaming, so you need to worry about locking, which avoids incorrect data manipulation, and the amber mentioned above also has a problem with GIL (Global interpreter Lock Lock), and since both of your tasks have processor intensity, this means that it will take longer due to calls notifying the threads of the collection and release of the threads. If, however, your tasks were intensive I / O, this is not so much affected.

Now, since there are no common data structures in the process, so do not worry about LOCKS and since it works independently of the GIL, so you really enjoy the real power of multiprocessors.

A simple note: the process is the same as a stream, without using shared data stores (everything works in isolation and is focused on messaging.)

check out dabeaz.com he gave a good presentation on concurrent programming once.

+8
source share

All Articles