How to implement or imitate MADV_ZERO?

Question

How to implement or imitate MADV_ZERO?

I would like to be able to reset the file memory mapping range without causing any io (to efficiently overwrite huge files sequentially without interrupting reading the io disk).

Running std::memset(ptr, 0, length) will read pages from disk if they are not already in memory, even if all pages are overwritten, which completely destroys disk performance.

I would like to do something like madvise(ptr, length, MADV_ZERO) that would have to reset the range (similar to FALLOC_FL_ZERO_RANGE ) in order to cause zero fill page errors instead of regular io page errors when accessing the specified range.

Sorry, MADV_ZERO does not exist. Despite the fact that the corresponding flag FALLOC_FL_ZERO_RANGE exists in fallocate and can be used with fwrite to achieve a similar effect, although without instantaneous process coherence.

One possible alternative, I would suggest using MADV_REMOVE . However, this may, in my opinion, cause file fragmentation, as well as block other operations at the time of completion, which makes me unsure of the long-term consequences of the work. My experience with Windows is that a similar FSCTL_SET_ZERO_DATA command can cause significant performance spikes when called.

My question is, how can I implement or emulate MADV_ZERO for general comparisons, preferably in user mode?

1. `/dev/zero/`

I read that it suggested in just reading /dev/zero in the selected range . Although I'm not quite sure what “reading in range” means and how to do it. Is it like fread from /dev/zero to a memory range? Not sure how this could avoid a regular page error on access?

For Linux, just read /dev/zero in the selected range. The kernel is already optimizing this case for anonymous comparisons.
If doing this is usually too difficult to implement, I suggest MADV_ZERO should have this effect: just like reading / dev / zero in a range, but always efficient.

EDIT: After the stream, it turns out a little further that it will not actually work.

This does not do tricks when you are dealing with general mapping.

2. `MADV_REMOVE`

One of the assumptions about implementation on Linux (i.e. not in the user application that I would prefer) could be just copying and modifying MADV_REMOVE , i.e. madvise_remove use FALLOC_FL_ZERO_RANGE instead of FALLOC_FL_PUNCH_HOLE . Although I lean over my head, guessing this, especially since I do not quite understand what the code around vfs_allocate :

 // madvice.c static long madvise_remove(...) ... /* * Filesystem fallocate may need to take i_mutex. We need to * explicitly grab a reference because the vma (and hence the * vma reference to the file) can go away as soon as we drop * mmap_sem. */ get_file(f); // Increment ref count. up_read(&current->mm->mmap_sem); // Release a read lock? Why? error = vfs_fallocate(f, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, // FALLOC_FL_ZERO_RANGE? offset, end - start); fput(f); // Decrement ref count. down_read(&current->mm->mmap_sem); // Acquire read lock. Why? return error; }

+8

c linux shared-memory mmap

ronag Aug 31 '15 at 23:56

source share

1 answer

Basile starynkevitch · Answer 1 · 2015-09-01T08:59:06+0000

You probably can't do what you want (in user space, without breaking the kernel). Please note that writing zero pages may not result in a physical drive being disconnected due to the page cache .

You might want to replace the file segment with a file hole (but this is not quite what you want) in a sparse file , but some file systems (like VFAT) do not have holes or sparse files. See lseek (2) with SEEK_HOLE , ftruncate (2)

How to implement or imitate MADV_ZERO?

1. /dev/zero/

2. MADV_REMOVE

More articles:

1. `/dev/zero/`

2. `MADV_REMOVE`