Writing to the same file simultaneously using threads and processes

Question

Writing to the same file simultaneously using threads and processes

What is the right solution to be sure that the file will never be damaged when using many threads and processes.

for threads that care about opening errors.

lock = threading.RLock() with lock: try: f = open(file, 'a') try: f.write('sth') finally: f.close() # try close in any circumstances if open passed except: pass # when open failed

for processes that I suppose should use multiprocessing.Lock

but if I want 2 processes, and the first process has 2 threads (each of them uses a file)

only theory exists, but I want to know how to mix synchronization with threads and processes. Do threads "inherit" this from a process ?, so only synchronization between processes is required?

and 2. I’m not sure if the above code needs a nested attempt if the record fails, and we want to close the open file (what if it remains open after the lock is released)

+8

python multithreading synchronization multiprocessing

Sławomir lenart Aug 23 '13 at 22:23

source share

1 answer

abarnert · Accepted Answer · 2013-08-23T22:31:04+0000

Although this is not entirely clear from the documents , multiprocessing synchronization primitives actually synchronize threads.

For example, if you run this code:

 import multiprocessing import sys import threading import time lock = multiprocessing.Lock() def f(i): with lock: for _ in range(10): sys.stderr.write(i) time.sleep(1) t1 = threading.Thread(target=f, args=['1']) t2 = threading.Thread(target=f, args=['2']) t1.start() t2.start() t1.join() t2.join()

... the output will always be 1111111111222222222 or 22222222221111111111 , not a mixture of the two.

Locks are implemented on top of Win32 kernel synchronization objects in Windows, semaphores on POSIX platforms that support them, and are not implemented at all on other platforms. (You can verify this with import multiprocessing.semaphore , which raises ImportError on other platforms, as described in the docs.)

That being said, it is safe to have two levels of locks, if you always use them in the correct order, that is, never capture threading.Lock , unless you can guarantee that your process has multiprocessing.Lock .

If you do it smart enough, it can lead to increased productivity. (Locks between processes on Windows and on some POSIX platforms can be several orders of magnitude slower than locks within a process.)

If you just do it in an obvious way (only with threadlock: inside with processlock: blocks, this obviously will not help performance and actually slow things down a bit (although maybe this is not enough) and it will not add any direct advantages . Of course, your readers will know that your code is correct, even if they don’t know that multiprocessing blocks work between threads, and in some cases debugging in-process deadlocks can be much easier than debugging interprocess deadlocks ... but I don’t think any of These are sufficient reasons for additional complexity in most cases.

Writing to the same file simultaneously using threads and processes

More articles: