Java: are there situations where a drive is as fast as memory?

I am writing code to access an inverted index . I have two interchangeable classes that do index reading. One reads the index from disk, buffering part of it. Another loading of the index is completely in memory, like byte [] [] (index size about 7 GB) and reading from this multidimensional array. One would expect better results, having in memory all the data. But my measures show that working with an index on a disk is as fast as having it in memory. (The time taken to load the index in memory is not taken into account in the specifications)

Why is this happening? Any ideas?

Additional information . I am running code supporting HPROF. Both working "on disk" and "in memory", the most used code is NOT the one that is directly related to reading. Also, for my (limited) understanding, the gc profiler does not show any problem with gc.

UPDATE # 1 . I checked my code to control I / O time. It seems that most memory requests take up 0-2000ns, while most disk requests take up 1000-3000ns. The second indicator seems to me too low. Is this related to Linux disk caching? Is there a way to exclude disk caching for benchmarking purposes?

UPDATE # 2 . I drew a response time for each index request. The line for memory and for the disk matches almost exactly. I did some other tests using the O_DIRECT flag to open the file (thanks to JNA!), And in this case the disk version for the code is (obviously) slower than the memory. So, I conclude that the β€œproblem” was that aggressive caching on a Linux drive was pretty awesome.

UPDATE # 3 : http://www.nicecode.eu/java-streams-for-direct-io/

+4
source share
2 answers

No, a drive can never be as fast as RAM (RAM is actually 100,000 times faster for magnetic disks). Most likely, the OS displays your file in memory for you.

+2
source

Three possibilities from the head:

  • The operating system already saves the entire index file in memory through the file system cache. (I still expect overhead, mind you).
  • The index is not the bottleneck of the code you are testing.
  • Your benchmarking methodology is not entirely correct. (This can be very difficult to do for benchmarking.)

The middle option seems to me the most likely.

+5
source

All Articles