Quickly resize mmap file

I need the size of a very large mmap file copied without changes, while allowing simultaneous access to read streams.

A simple way is to use two MAP_SHARED mappings (enlarge the file, then create a second display that includes the grown area) in the same process on the same file, and then undo the old mapping when all readers who can access it are finished . However, I am curious if the scheme below can work, and if so, is there any advantage.

  • mmap file with MAP_PRIVATE
  • make read-only access to this memory in multiple threads.
  • either get a mutex for the file, write to memory (suppose that this is done so that readers who can read this memory are not mixed up)
  • or get mutexes, but increase the file size and use mremap to move it to a new address (change the display size without copying or an unnecessary IO file.)

The crazy part is in (4). If you move the memory, the old addresses become invalid and readers who are still reading it may suddenly get an access violation. What if we modify readers to block this access violation and then restart the operation (i.e. do not re-read the bad address, recalculate the address, given the offset and the new base address from mremap.) Yes, I know that it’s evil, but In my opinion, readers can only successfully read data at the old address or fail with access violation and retry. If sufficient care is taken, this should be safe. Since recalibration will not occur frequently, readers will ultimately succeed and not get stuck in the repeat loop.

A problem may arise if this old address space is reused while the reader still has a pointer to it. Then there will be no violation of access rights, but the data will be incorrect, and the program will enter the unicorn and candy zone filled with undefined land (in which there are usually no unicorns or candies.)

But if you had full control over the allocation and could make sure that any distributions that occur during this period would never reuse this old address space, then this should not be a problem and the behavior should not be undefined.

I'm right? Could this work? Is there any advantage to this when using two MAP_SHARED mappings?

+7
source share
1 answer

It’s hard for me to imagine a case where you don’t know the upper bound of how big the file is. Assuming this is true, you can "reserve" the address space for the maximum file size by specifying this size when the file is first mapped using mmap (). Of course, any access outside the actual file size will lead to an access violation, but how would you like it to work anyway - you could argue that reserving an additional address space provides an access violation, rather than leaving this address range open used by other calls things like mmap () or malloc ().

In any case, the point is in my decision, you never move the address range, you only change its size, and now your lock is around the data structure, which provides the current allowable size for each stream.

My solution does not work if you have so many files that the maximum mapping for each file will get you out of the address space, but this is the age of the 64-bit address space, so hopefully your maximum mapping size will not be a problem.

(To make sure that I didn’t forget something stupid, I wrote a small program to convince myself that creating a larger size than the file size violates access rights when trying to access beyond the file size and that works fine as soon as you The ftruncate () file will be larger, all with the same address that was returned from the first mmap () call.)

+4
source

All Articles