I have a bunch of large files, each file can be more than 100 GB, the total amount of data can be 1 TB, and all of them are read-only (just random reads).
My program does small reads in these files on a computer with 8 GB of main memory.
To improve performance (without searching () and without copying the buffer), I thought about using memory mapping and basically a memory card with only 1 TB of data.
Although at first it seems crazy, since the main memory is <disk, with an understanding of how virtual memory works, you should see that there should be no problems on 64-bit machines.
All pages read from disk to respond to my read () will be considered βcleanβ from the OS, as these pages will never be overwritten. This means that all these pages can go directly to the list of pages that can be used by the operating system, without writing to the disk or replacing (erasing them). This means that the operating system can actually only store LRU pages in physical memory and will only work with reads () when the page is not in main memory.
This would mean no replacement and no increase in I / O due to the huge memory mapping.
This is a theory; what I'm looking for is any of you who each tried or used this approach for real production and can share their experience: are there any practical problems with this strategy?
source share