What happens if I log into a single file from several different processes in python?

I spent several hours to figure this out, first about those issues:

  • Atomicity of `write (2)` to the local file system
  • How can I synchronize - produce atomic recording in one file from two processes?
  • How to programmatically determine if there is a โ€œrecordโ€, a system call is atomic in a specific file?
  • What happens if a write system call is called in the same file by two different processes at the same time?
  • http://article.gmane.org/gmane.linux.kernel/43445

It seems that when opening the file we use the "O_APPEND" flag, it will always be normally registered in one file from several processes in Linux. And I believe that python necessarily uses the O_APPEND flag in its registration module.

And from a little test:

#!/bin/env python import os import logging logger = logging.getLogger('spam_application') logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.FileHandler('spam.log') logger.addHandler(fh) formatter = logging.Formatter( '%(asctime)s - %(name)s - %(levelname)s - %(message)s') fh.setFormatter(formatter) for i in xrange(10000): p = os.getpid() logger.debug('Log line number %s in %s', i, p) 

And I launched it with

 ./test.py & ./test.py & ./test.py & ./test.py & 

I found that there is nothing wrong with spam.log. This behavior may support the conclusion above.

But the problems that arise are:

  • What does this mean here ?
  • And what are the scenarios for using this , just to rotate the file?

Finally, if two processes write to the same file, I mean that they call (2) to the same file, which is sure that the data from the two processes do not alternate (kernel or file system?) And how. [NOTE: I just want to see in the back of the syscall script, any hit on this is welcome.]

EDIT1:

Do this and this just exist for compatibility between different os environments like windows, linux or mac?

EDIT2:

One more test, feed the 8K lines to logging.debug every time. And this time, I see the "rotation" behavior in spam.log. This behavior is simply indicated in PIPE_BUF one page above. So it seems that the behavior is clear on linux, using O_APPEND is ok if the size of the record (2) is less than PIPE_BUF.

+3
python logging atomic multiprocessing atomicity
Jul 6 '16 at 8:32
source share
3 answers

I dug deeper and deeper. Now I think about it clearly:

  • With O_APPEND, concurrent recording (2) of several processes is in order. It's just that the order of the lines is not defined, but the lines do not alternate and do not rewrite. And the size of the data for any amount, according to Niall Douglas, is responsible for Understanding the simultaneous recording of files from several processes . I tested this โ€œfor any amountโ€ on linux and did not find the upper limit, so I assume this is correct.

  • Without O_APPEND, it will be a mess. Here is what POSIX says: โ€œThis POSIX.1-2008 volume does not specify the behavior of writing to a file from several processes simultaneously. Applications should use some form of concurrency control.

  • Now we enter python. The test I did in EDIT3 is 8K, I found it as a source. Python write () uses fwrite (3), and my python sets here BUFF_SIZE, which is 8192. According to the answer from abarnert in Default buffer size for a file in Linux . This 8192 has a long history.

However, more information is appreciated.

+2
Jul 07 '16 at 8:24
source share

I would not rely on tests here. Strange things can happen only in race conditions, and an indication of the state of the race by the criterion makes almost no sense, since the state of the race is unlikely to happen. Thus, it can work well for 1000 test runs and randomly break later in prod ... The page you are quoting says:

writing to a single file from several processes is not supported, because there is no standard way to serialize access to one file through several processes in Python

This does not mean that it will break ... it can even be safe in a particular implementation in a particular file system. It just means that it can break without any hope of fixing any other version of Python or any other file system.

If you really want to make sure of this, you will have to dive into the Python source code (for your version) to control how logging is actually performed and to see if it is safe on your file system. And you will always be threatened by the possibility that subsequent optimization in the registration module will violate your assumptions.

IMHO, which is the reason for the warning in the logbook, as well as the presence of a special module for parallel logging in the same file. This last one does not rely on something indefinite, but simply uses explicit blocking.

+1
Jul 6 '16 at 9:15
source share

I tried similar code like this (I tried in Python 3)

 import threading for i in range(0,100000): t1 = threading.Thread(target= funtion_to_call_logger, args=(i,)) t1.start() 

For me, this worked perfectly fine, a similar problem has been resolved here.

This took a lot of processor time, but not memory.

EDIT:
A penalty means that all the requested items were registered, but the Order was absent. Race Race condition is still not fixed,

0
Dec 12 '18 at 10:17
source share



All Articles