Is it safe to read () from a file as soon as write () returns?



I have a very specific application where I need an auto-increment variable with persistent storage.

To be precise, I save the decimal representation of the int variable in a file. To generate the next number, I read() from the file, convert the contents back to int , add 1 and write() back to the file. I do not need simultaneous access to this data. Only one thread from one process calls functions to get the auto-increment number. The program works in an embedded environment where no one will have access to the console, so security should not be a concern. If that matters, it works on Linux 2.6.24 on MIPS.
The problem is that I do not get 100% reproducible results. Sometimes I get duplicate numbers, which is not acceptable for my application.

My implementation is as follows.

When starting the application, I have:

 int fd = open("myfile", O_RDWR|O_CREAT|O_SYNC, S_IRWXU|S_IRWXG|S_IRWXO); 

And auto zoom features:

 int get_current(int fd) { char value[SIZE]; lseek(fd, 0, SEEK_SET); read(fd, value, SIZE); return atoi(value); } int get_next(int fd) { char value[SIZE]; int cur = get_current(fd); memset(value, 0, SIZE); sprintf(value, "%d", cur + 1); lseek(fd, 0, SEEK_SET); write(fd, value, SIZE); //fsync(fd); /* Could inserting this be the solution? */ return (cur + 1); } 

I specifically missed the error checking above for the convenience of reading code. I have code to check the return values ​​of all system calls.

The code was originally written by another person, and now that I have discovered this problem, the first step to solving it is to find out what might cause this. I am concerned that this could be due to file access caching. I know when I write() I have no guarantees, the data ever actually reached the physical medium, but is it safe to call read() without calling fsync() and still get predictable results? If so, then I have no ideas;)

Thanks for reading.

+1
c linux
source share
3 answers

Yes, it’s safe to read immediately after writing. On a Unix-like system, data is safely stored in the kernel buffer pool when write() returned and will be returned to other processes that must read the data. Similar comments are used when using O_SYNC, O_DSYNC, O_FSYNC (which provide data writing to disk) and Windows systems. Obviously, asynchronous writing will not be completed when aio_write() is returned, but will be completed when completion is completed.

However, your problem arises from the fact that you do not guarantee that you have one process or thread accessing the file at a time. You need to make sure that you are getting sequential access so that you cannot read two files (or streams) from the file at the same time. This is the β€œlost update” problem in terms of the DBMS.

You need to make sure that only one process has access at a time. If your processes interact, you can use advisory locking (via fcntl() on POSIX systems). If your processes are not interacting, or you are not sure, you may need to lock or use some other technique.

+5
source share

Yes, if you write() to a file, and then read() from it, you should see the data you just wrote. The exception is that another process or thread was overwriting the file at the same time, or if write () actually failed.

0
source share

File contents are a very bad way to implement an atomic counter. How big will your score be? If this is not cumbersome, one simple method would be to write one byte (no matter what) to increase the counter, and use fstat ( st_size ) to read the counter. ftrunc can reset the counter to zero.

A cleaner way to implement what you need is to map the card to a file (using mmap ) and store not only the counter, but also pthread_mutex_t , which is initialized for process sharing, and block it when updating the account.

Another way to use mmap is that you have C1x atoms ( _Atomic int ), but you have to wait 5-10 years. :-) Or you can use gcc intrinsics or asm for atomic operations. This solution certainly has better performance (slightly better than the pthread_mutex_t approach and hundreds of times faster than the write approach).

0
source share

All Articles