How can I write atomic record in stdout in python?

In some sources, I read that the print command is not thread safe, and the workaround is to use the sys.stdout.write command , but still this does not work for me, and writing to STDOUT is not atomic.

Here is a short example (called this file parallelExperiment.py):

import os import sys from multiprocessing import Pool def output(msg): msg = '%s%s' % (msg, os.linesep) sys.stdout.write(msg) def func(input): output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input))) def executeFunctionInParallel(funcName, inputsList, maxParallelism): output(u'Executing function %s on input of size %d with maximum parallelism of %d' % ( funcName.__name__, len(inputsList), maxParallelism)) parallelismPool = Pool(processes=maxParallelism) executeBooleanResultsList = parallelismPool.map(funcName, inputsList) parallelismPool.close() output(u'Function %s executed on input of size %d with maximum parallelism of %d' % ( funcName.__name__, len(inputsList), maxParallelism)) # if all parallel executions executed well - the boolean results list should all be True return all(executeBooleanResultsList) if __name__ == "__main__": inputsList=[str(i) for i in range(20)] executeFunctionInParallel(func, inputsList, 4) 

Look at the exit:

I. The output of the python call parallelExperiment.py (note that the word "pid" is spoiled in some lines):

 Executing function func on input of size 20 with maximum parallelism of 4 ppid:2240 got input "0" id:4960 got input "2" pid:4716 got input "4" pid:4324 got input "6" ppid:2240 got input "1" id:4960 got input "3" pid:4716 got input "5" pid:4324 got input "7" ppid:4960 got input "8" id:2240 got input "10" pid:4716 got input "12" pid:4324 got input "14" ppid:4960 got input "9" id:2240 got input "11" pid:4716 got input "13" pid:4324 got input "15" ppid:4960 got input "16" id:2240 got input "18" ppid:2240 got input "19" id:4960 got input "17" Function func executed on input of size 20 with maximum parallelism of 4 

II. The output of the python call is parallelExperiment.py> parallelExperiment.log , which means redirecting stdout to the parallelExperiment.log file (note that the line order is not good, because before and after the call to executeFunctionInParallel , which calls func in parallel, you should print a message):

 pid:3244 got input "4" pid:3244 got input "5" pid:3244 got input "12" pid:3244 got input "13" pid:240 got input "0" pid:240 got input "1" pid:240 got input "8" pid:240 got input "9" pid:240 got input "16" pid:240 got input "17" pid:1268 got input "2" pid:1268 got input "3" pid:1268 got input "10" pid:1268 got input "11" pid:1268 got input "18" pid:1268 got input "19" pid:3332 got input "6" pid:3332 got input "7" pid:3332 got input "14" pid:3332 got input "15" Executing function func on input of size 20 with maximum parallelism of 4 Function func executed on input of size 20 with maximum parallelism of 4 
+7
python output atomic multiprocessing stdout
source share
2 answers

This is due to multiprocessing.Pool , which actually uses subprocesses instead of threads. You must use explicit synchronization between processes. Please note that the sample link, it solves your problem.

 import os import sys from multiprocessing import Pool, Lock lock = Lock() def output(msg): msg = '%s%s' % (msg, os.linesep) with lock: sys.stdout.write(msg) def func(input): output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input))) def executeFunctionInParallel(funcName, inputsList, maxParallelism): output(u'Executing function %s on input of size %d with maximum parallelism of %d' % ( funcName.__name__, len(inputsList), maxParallelism)) parallelismPool = Pool(processes=maxParallelism) executeBooleanResultsList = parallelismPool.map(funcName, inputsList) parallelismPool.close() output(u'Function %s executed on input of size %d with maximum parallelism of %d' % ( funcName.__name__, len(inputsList), maxParallelism)) # if all parallel executions executed well - the boolean results list should all be True return all(executeBooleanResultsList) if __name__ == "__main__": inputsList=[str(i) for i in range(20)] executeFunctionInParallel(func, inputsList, 4) 
+7
source share

If you want to avoid blocking and are happy to switch to a lower level interface, you can get POSIX O_APPEND behavior using os.open , os.write (if your system supports it); and see Is the append file atomic on UNIX? .

+1
source share

All Articles