I am writing code to access an inverted index . I have two interchangeable classes that do index reading. One reads the index from disk, buffering part of it. Another loading of the index is completely in memory, like byte [] [] (index size about 7 GB) and reading from this multidimensional array. One would expect better results, having in memory all the data. But my measures show that working with an index on a disk is as fast as having it in memory. (The time taken to load the index in memory is not taken into account in the specifications)
Why is this happening? Any ideas?
Additional information . I am running code supporting HPROF. Both working "on disk" and "in memory", the most used code is NOT the one that is directly related to reading. Also, for my (limited) understanding, the gc profiler does not show any problem with gc.
UPDATE # 1 . I checked my code to control I / O time. It seems that most memory requests take up 0-2000ns, while most disk requests take up 1000-3000ns. The second indicator seems to me too low. Is this related to Linux disk caching? Is there a way to exclude disk caching for benchmarking purposes?
UPDATE # 2 . I drew a response time for each index request. The line for memory and for the disk matches almost exactly. I did some other tests using the O_DIRECT flag to open the file (thanks to JNA!), And in this case the disk version for the code is (obviously) slower than the memory. So, I conclude that the βproblemβ was that aggressive caching on a Linux drive was pretty awesome.
UPDATE # 3 : http://www.nicecode.eu/java-streams-for-direct-io/
source share