Optimize file open and read

I have a C ++ application running on Windows that wakes up every 15 minutes to open and read files present in a directory. The directory changes every time it starts.

  • open is executed ifstream.open (file_name, std :: ios :: binary)
  • reading is done by streambuf ios :: rdbuf () *
  • The total number of files every 15 minutes is about 50,000.
  • Files are opened and read in batches of 20
  • Each file is about 50 KB in size.

For each run; this operation (open and read) takes about 18-23 minutes on a dual-core machine with a spindle speed of 6000 rpm. I grabbed a memory page with an error / sec, and they range from 8000 to 10000.

Is there a way to reduce page errors and optimize the opening and reading of files?

Gowtham

+4
source share
4 answers

Do not use STL if you can avoid it. It handles very complex issues of internationalization and translation / conversion, which slows down the work.

Most often, the fastest way to read a file is with its memory card (also in windows, CreateFileMapping as a starting point . If at all possible, use one file with a total size of 50'000 * 50K and directly index this file when writing / reading. You also you should consider using DB (even SQLite) if the data is structured at all. This amount of data is so small that it should always remain in memory. You can also try using ramdisk so that you don’t go to disk at all (this will cause your recovery to fail errors in the event of equipment / electricity failure).

+3
source

First; Thanks for all the answers. It was very helpful and provided us with many opportunities to learn.

We removed the STL and used C (fopen and fread). This slightly improved Open and Read performance for the above data in 16-17 minutes.

We really nailed the problem by compressing these files. This reduced the size of each file form from 50K to 8K. The time spent on the operation "Open" and "Read" was reduced to 4 - 5 minutes.

Thanks.

+1
source

According to the MS PSDK documentation, file caching can be used. And, IMHO, instead of STL, Windows, the native CreatFile, ReadFile and CloseHandle with the corresponding flags can get better performance, since you mentioned windows.

But, on the other hand, according to your post, it seems you are only reading. Therefore, caching cannot significantly improve performance. But since the processor is fast and disk operations are usually slow, you can still use this concept of intermediate buffers together with multi-threaded, which means the execution of parallel read streams.

0
source
  • Perhaps you can use something like memoisation, that is, if the file has not changed (you can save it the last time it was updated), you can use it from the last time, that is, save something in memory.

  • I think you do not need FS caching. That is, it is better to open the files in O_DIRECT mode (this is Linux, but I'm sure Windows has something similar) and read each file in one I / O, i.e. create a buffer in the file size memory and read it. This should significantly reduce the load on the processor and memory.

  • The multithreading suggested above will also help, but not by much. I suspect that the bottleneck is a disk that can perform a limited number of I / O operations per second (100 might be an estimate). Therefore, you need to reduce the number of input-output operations, for example, using (1), (2) described above or something else.

0
source

All Articles