Random access to a large binary file

I have a large binary file (12 GB) from which I want to build a smaller binary file (16 KB) on the fly. Suppose the file is on disk and that the bytes for the smaller file are somewhat randomly allocated in the large binary. What is the best and fastest way to do this? So far, I could not do better than about three minutes.

Things I've tried that have more or less the same characteristics:

  • Convert the file to HDF5 format and use the C interface (slow).
  • Writing a small C program for fseek () via a file (slow).

How can I quickly access this data quickly?

I want to get less than a couple of seconds for a request.

+5
source share
7 answers

The answer is basically no.

A single mechanical disk drive will take about 10 ms or so to perform a search because it must move the disk head. 16,000 requests every 10 milliseconds per search is 160 seconds. It doesn't matter how you write your code; e.g. mmap () doesn't matter.

Welcome to the physical world, software man :-). You must improve the location of your operations.

First, sort the addresses you are accessing. The nearest locations in the file are likely to be nearby on disk, and searching between adjacent locations is faster than randomly searching.

, , 100 ; 1 , . , 1 , , . ( , .)

, RAID ( ). , , .

- , , . , . (, , ).

[]

@JeremyP SSD - . , 0,1 . , , 50-100 . ( , 1- TB, SSD- .)

[edit 2]

@FrankH , , , , , . , (, XFS) "" (, posix_fallocate , ).

+11

, , , , , 96 kB, .

? () ; () , .

, , - readahead, ; , , fadvise(fd, 0, MAX_OFFSET, FADV_RANDOM); filedescriptor . madvise(), mmap() . , ( , ). , .

, N , M msec, N * m ( ...). .

: :

, . :

  • , (.. N+1 , N , ). / , , ( , ) .
  • , - - (UN * X preadv() ), .
  • / - / ; , , ., , statvfs() ioctl_list. , , , Nemo ( "" , ).
  • , , FIEMAP/FIBMAP ( Windows FSCTL_GET_RETRIEVAL_POINTERS), , , ( "" , , ).
  • , () , , , / .

, , . / RAM, (/ ).

+4

mmaping ? ( , mmap64). .

, , SSD, . , ?

?

+1

, , . , 1 Gigabit/second, , 12 x 8 = 96 . , , .

, , , , , , , , , , , .

SSD, , , , , ...

+1

, , . 16 , ? 12- GB ? .

? .

0

( , ): - , - , POSIX, posix_fadvise(), OS .

0

. , .. preadv, FrankH.

, - , , , RAID- .

, -, . , / ( , , ). , "noop" , "" , cfq , .

0
source

All Articles