Writing to a remote file: When does the write () function really return?

I have a node client that writes a file to a hard drive that is on another node (I write in parallel fs actually).

What I want to understand:
When do I write() (or pwrite() ) when does write call return?

I see three possibilities:

  • write returns immediately after the order of the I / O operation on the client side:
    In this case, write can return before the node actually leaves the client (if you are writing to the local hard drive, then write calls use deferred writes, where the data is simply put in the write queue. It also happens when you write to the remote hard drive disk?). I wrote a test file in which I write a large matrix (1GByte) to a file. Without fsync it showed very high bandwidths, while with fsync results looked more realistic. It looks like he can use delayed recordings.

  • write returned after transferring data to the server buffer:
    Now the data is on the server, but it is in the buffer in the main memory, but it is not yet stored on the hard disk. In this case, the I / O time should take the time to transfer data over the network.

  • write returned after the actual data has been saved to the hard drive:
    Which, I am sure, does not happen by default (unless you write really large files that cause your RAM to fill up and ultimately run down, etc.).

In addition, what I would like to be sure about: Can there be a situation when the program terminates without any data actually leaving the client node, so that network parameters, such as latency, bandwidth and bandwidth of the hard disk, are not displayed at all program execution time? Think we are not doing fsync or something like that.

EDIT: I am using the pvfs2 parallel file system

+4
source share
2 answers

Option 3., of course, is simple and safe. However, a POSIX-compliant parallel file system with performance sufficient for anyone to actually use it usually uses option 1 in conjunction with some more or less involved mechanism to avoid conflicts when, for example, several clients cache the same file.

As the saying goes, "in Computer Science, there are only two difficult things: the invalidity of the cache, naming, and errors" one after another. "

If the file system needs to be POSIX compatible, you need to go on to study the semantics of POSIX fs and find how fs supports them when you get good performance (alternatively, what parts of the POSIX semantics does it skip, a la NFS), which makes it wrong, interesting that the semantics of POSIX fs goes back to 1970, but hardly knows how to support network file systems.

I don’t know about pvfs2 specifically, but as a rule, in order to comply with POSIX and provide decent performance, option 1 can be used together with some kind of cache coherence protocol (for example, Luster). For fsync (), the data must actually be transferred to the server and committed to stable storage on the server (disks or write cache with backup) before fsync () returns. And, of course, the client has some restriction on the number of dirty pages, after which it blocks further writing () to the file until some are transferred to the server.

+3
source

You can get any of the three options. It depends on the flags you provide for calling open . It depends on how the file system was mounted locally. It also depends on how the remote server is configured.

All taken from Linux. Solaris and others may vary.

Some important open flags are: O_SYNC , O_DIRECT , O_DSYNC , O_RSYNC .

Some important mount flags for NFS: ac , noac , cto , nocto , lookupcache , sync , async .

Some important flags for exporting NFS are: sync , async , no_wdelay . And, of course, the mount flags of the file system exporting NFS are important. For example, if you exported XFS or EXT4 from Linux, and for some reason you used the nobarrier flag, server-side power loss would almost certainly result in data loss.

+2
source

All Articles