Old school.
p1.py
import csv import pickle import sys with open( "someFile", "rb" ) as source: rdr = csv.reader( source ) for line in eumerate( rdr ): pickle.dump( line, sys.stdout )
p2.py
import pickle import sys while True: try: i, row = pickle.load( sys.stdin ) except EOFError: break pickle.dump( i, sum(row) )
p3.py
import pickle import sys while True: try: i, row = pickle.load( sys.stdin ) except EOFError: break print i, row
Here is the final structure of multiprocessing.
python p1.py | python p2.py | python p3.py
Yes, the shell brings them together at the OS level. It seems to me that it is easier for me, and it works very well.
Yes, a bit more overhead when using pickle (or cPickle). However, simplification seems to be worth the effort.
If you want the file name to be an argument to p1.py , this is an easy change.
More importantly, a feature such as the following is very convenient.
def get_stdin(): while True: try: yield pickle.load( sys.stdin ) except EOFError: return
This allows you to do this:
for item in get_stdin(): process item
It is very simple, but it does not easily allow you to have multiple copies of P2.py.
You have two problems: a fan and a fan. P1.py must somehow deploy several P2.py. And P2.py must somehow combine its results into one P3.py.
The old school branching approach is a push architecture that is very efficient.
Theoretically, multiple P2.py pulling out of a shared queue is an optimal allocation of resources. This is often ideal, but it is also quite a bit of programming. Is programming really necessary? Or will circular machining be good enough?
In practice, you will find that creating P1.py makes a simple round robin that deals with several P2.py can be pretty good. You will have P1.py configured to work with n copies of P2.py through named pipes. Each P2.py will be read from the corresponding pipe.
What if one P2.py gets all the βworst caseβ data and starts up? Yes, circular motion is not perfect. But this is better than just one P2.py, and you can eliminate this bias with simple randomization.
Turning on a fan from several P2.py to one P3.py is a bit more complicated. At the moment, the old-school approach is no longer profitable. P3.py needs to be read from multiple named pipes using the select library to alternate reads.