Atomic file modification

There is an area in the file (possibly small) that I want to rewrite. Suppose I call fseek, fwrite, fsync. Is there any way to ensure atomicity of such an operation of rewriting a region, for example. I must be sure that in any case of failure, the region will contain only old (before modification) data or only new (changed) data, but not a combination of this.

There are two things that I want to highlight.

First one . This is normal if it is not possible to atomically write ANY size region - we can process it by adding data to the file, fsync'ing and then rewriting the "pointer" area in the file, then fsyncing again. However, if the entry "pointer" is not atomic, we can still have a damaged file with illegal pointers.

Second one . I am sure that writing 1-byte areas is atomic: I will not see any bytes in the file that I never put there. Thus, we can use some tricks with the distribution of two areas for addresses and use a 1-byte switch, so rewriting the area began - add new data, synchronize, rewrite one of the two (unused) slots of the pointer, synchronize again, and then rewrite the "byte" '' and synchronization again. Thus, the rewriting area operation now contains at least 3 fsync calls.

All this would be much easier if I had atomic writing for a long time, but do I really have it?

Is there a way to handle this situation without using the method mentioned in paragraph 2?

Another question: is there any guarantee of the order between writing and synchronization? For example, if I call fseek, fwrite [1], fseek, fwrite [2], fsync, can I write in [2] and not write in [1]?

This question is applicable to the Linux and Windows operating systems, it also needs some specific answer (for example, in the ubuntu abc version ...).

+6
source share
1 answer

It is usually safe to assume that recording 512-byte fragments is performed in a single recording from hard drives. However, I would not have expected this. Instead, I will move on to your second solution by adding a checksum to your record and checking it before changing the pointer in the file.

It is generally recommended that you add a checksum to everything that is written to disk.

To answer the “synchronization” of the guarantee, you can assume this. Although FS synchronization is disk dependent, let's say we're talking about a “smart” implementation.

  • After the first sync data will be flushed to disk (the disk may still be in this cache), and if the data is expected to receive everything you wrote.
  • If after the second sync data of both synchronizations is in the disk cache, the situation described by you may occur, but IMHO the probability of this is very low.

In any case, there is no other mechanism that promises you data on disk. That is why you should have checksums .

Additional Information: Make sure fsync has completed its work.

+1
source

Source: https://habr.com/ru/post/924575/


All Articles