Sorry for adding something to the old thread. And for such a long post.
I know only one way to make a complete race condition without rename() in the absence of a lock, which should work practically on any file system, even on NFS with an intermittend server reboot and client time warps.
The following recipe is a condition of the race in the sense that under no circumstances can data be lost. It also does not require locks and can be performed by clients who do not want to cooperate, except that they all use the same algorithm.
This is not a race condition free in the sense that if something is seriously violated, everything remains in a clean state. It also has a short period of time when neither the source nor destination is present at their location, but the source is still under a different name. And this is not hardened from cases when an attacker tries to provoke harm ( rename() is the culprit, go to the figure).
S - source, D - destination, P (x) - dirname(x) , C (x, y) - this is x/y path concatenation
- make sure the destination does not exist. Just to make sure that we do not take the following steps in vain.
- create a possibly unique name T: = C (P (D), random)
- mkdir (T) if this does not match the loop of the previous step
- open (C (T, "lock"), O_EXCL) if rmdir (T) does not work, ignoring errors and going to the previous step
- rename (S, C (T, "TMP"))
- link (C (T, "TMP"), D)
- Unlink (C (T, "TMP"))
- Unlink (C (T, "lock"))
- RmDir (T)
The safe_rename(S,D) algorithm is explained:
The problem is that we want to make sure that there are no race conditions, either at the source or at the destination. It is assumed that (almost) everything can happen between each step, but all other processes follow the exact algorithm when free break conditions are met. This includes that T's temporary directories are never affected, except that after making sure (this is a manual process), the process using the directory has died and cannot be resurrected (for example, continue to hibernate VM after recovery).
To properly execute rename() , we need a place to hide. Thus, we build the directory so that no one uses it (who follows the same algorithm).
However, mkdir() not guaranteed to be atomic in NFS. Therefore, we must make sure that we have a certain guarantee that we are alone in the catalog. This is O_EXCL in the lock file. This - strictly speaking - is not a lock, it is a semaphore.
Except in such rare cases, mkdir() usually atomic. We can also create the use of a cryptographically secure random name for the directory, add the GUID, host name and PID to make sure that it is unlikely that anyone else chooses the same name randomly. However, in order to prove the correctness of the algorithm, we need this file called lock .
Now that we have a basically empty directory, we can safely rename() source there. This ensures that no one else changes the source until unlink() . (Well, the contents may change, this is not a problem.)
Now the link() trick can be applied to make sure that we are not rewriting the destination.
Subsequently, unlink() can fulfill the race condition on the remaining source. The rest is cleaning.
There is only one problem left:
If link() fails, we have already moved the source. For proper cleaning, we must return it back. This can be done by calling safe_rename(C(T,"tmp"),S) . If this fails, all we can do is try to clear as much as possible ( unlink(C(T,"lock")) , rmdir(T) ) and leave the debris for manual cleaning by the administrator.
Concluding remarks:
To help clean up garbage, you can use a more suitable file name than tmp . Choosing smart skills can make the algorithm against attacks somewhat difficult.
And if you move the boot files, you can reuse the directory, of course.
However, I agree that this algorithm is simple redundant, and something like O_EXCL on rename() missing.