Fwrite performance for massive number of small records

Question

Fwrite performance for massive number of small records

I have a program that saves many large files> 1 GB using fwrite . It works fine, but unfortunately, due to the nature of the data, each fwrite call writes only 1 4 bytes. as a result, the recording may take more than an hour, and most of this time, apparently, is associated with overhead (or, at least, in the library function fwrite). I have a similar problem with fread .

Does anyone know of any existing / library functions that will buffer these records and read using the built-in function, or is this another roll of your own?

+6

c ++ unix system-calls fwrite

camelccc Nov 27 '12 at 16:07

source share

6 answers

your problem is not buffering for fwrite() , but a total of overhead library calls with a small amount of data. if you write only 1 MB of data, you make 250,000 function calls. you better try to collect your data in memory and then write to disk with a single fwrite() call.

UPDATE : if you need proof:

 $ dd if=/dev/zero of=/dev/null count=50000000 bs=2 50000000+0 records in 50000000+0 records out 100000000 bytes (100 MB) copied, 55.3583 s, 1.8 MB/s $ dd if=/dev/zero of=/dev/null count=50 bs=2000000 50+0 records in 50+0 records out 100000000 bytes (100 MB) copied, 0.0122651 s, 8.2 GB/s

+5

lenik Nov 27 '12 at 16:20

source share

Well, that was interesting. I thought I would write some real code to find out what speed is. And here it is. Compiled using C ++ DevStudio 2010 Express. There is quite a bit of code here. This time 5 ways to write data: -

Naive call fwrite
Using a buffer and making smaller calls for fwrite using larger buffers
Using the Win32 API is naive
Buffer usage and fewer calls for Win32 using large buffers
Using Win32, but double buffering output and using asynchronous write

Please check that I have not done something a little stupid with any of the above.

The program uses the QueryPerformanceCounter to synchronize the code and ends the synchronization after closing the file to try to include any pending internal buffered data.

Results on my machine (old box WinXP SP3): -

fwrite itself is the fastest, although the buffered version can sometimes beat it if you get the right size and iteration.
Naive Win32 is much slower
Buffered Win32 doubles speed, but it is still easily beaten by fwrite
Asynchronous writes were not significantly better than the buffered version. Maybe someone can check my code and make sure that I didn’t do something stupid, since I had never used asynchronous I / O before.

You may get different results depending on your setting.

Feel free to edit and improve the code.

  #define _CRT_SECURE_NO_WARNINGS #include <stdio.h> #include <memory.h> #include <Windows.h> const int // how many times fwrite/my_fwrite is called c_iterations = 10000000, // the size of the buffer used by my_fwrite c_buffer_size = 100000; char buffer1 [c_buffer_size], buffer2 [c_buffer_size], *current_buffer = buffer1; int write_ptr = 0; __int64 write_offset = 0; OVERLAPPED overlapped = {0}; // write to a buffer, when buffer full, write the buffer to the file using fwrite void my_fwrite (void *ptr, int size, int count, FILE *fp) { const int c = size * count; if (write_ptr + c > c_buffer_size) { fwrite (buffer1, write_ptr, 1, fp); write_ptr = 0; } memcpy (&buffer1 [write_ptr], ptr, c); write_ptr += c; } // write to a buffer, when buffer full, write the buffer to the file using Win32 WriteFile void my_fwrite (void *ptr, int size, int count, HANDLE fp) { const int c = size * count; if (write_ptr + c > c_buffer_size) { DWORD written; WriteFile (fp, buffer1, write_ptr, &written, 0); write_ptr = 0; } memcpy (&buffer1 [write_ptr], ptr, c); write_ptr += c; } // write to a double buffer, when buffer full, write the buffer to the file using // asynchronous WriteFile (waiting for previous write to complete) void my_fwrite (void *ptr, int size, int count, HANDLE fp, HANDLE wait) { const int c = size * count; if (write_ptr + c > c_buffer_size) { WaitForSingleObject (wait, INFINITE); overlapped.Offset = write_offset & 0xffffffff; overlapped.OffsetHigh = write_offset >> 32; overlapped.hEvent = wait; WriteFile (fp, current_buffer, write_ptr, 0, &overlapped); write_offset += write_ptr; write_ptr = 0; current_buffer = current_buffer == buffer1 ? buffer2 : buffer1; } memcpy (current_buffer + write_ptr, ptr, c); write_ptr += c; } int main () { // do lots of little writes FILE *f1 = fopen ("f1.bin", "wb"); LARGE_INTEGER f1_start, f1_end; QueryPerformanceCounter (&f1_start); for (int i = 0 ; i < c_iterations ; ++i) { fwrite (&i, sizeof i, 1, f1); } fclose (f1); QueryPerformanceCounter (&f1_end); // do a few big writes FILE *f2 = fopen ("f2.bin", "wb"); LARGE_INTEGER f2_start, f2_end; QueryPerformanceCounter (&f2_start); for (int i = 0 ; i < c_iterations ; ++i) { my_fwrite (&i, sizeof i, 1, f2); } if (write_ptr) { fwrite (buffer1, write_ptr, 1, f2); write_ptr = 0; } fclose (f2); QueryPerformanceCounter (&f2_end); // use Win32 API, without buffer HANDLE f3 = CreateFile (TEXT ("f3.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0); LARGE_INTEGER f3_start, f3_end; QueryPerformanceCounter (&f3_start); for (int i = 0 ; i < c_iterations ; ++i) { DWORD written; WriteFile (f3, &i, sizeof i, &written, 0); } CloseHandle (f3); QueryPerformanceCounter (&f3_end); // use Win32 API, with buffer HANDLE f4 = CreateFile (TEXT ("f4.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_FLAG_WRITE_THROUGH, 0); LARGE_INTEGER f4_start, f4_end; QueryPerformanceCounter (&f4_start); for (int i = 0 ; i < c_iterations ; ++i) { my_fwrite (&i, sizeof i, 1, f4); } if (write_ptr) { DWORD written; WriteFile (f4, buffer1, write_ptr, &written, 0); write_ptr = 0; } CloseHandle (f4); QueryPerformanceCounter (&f4_end); // use Win32 API, with double buffering HANDLE f5 = CreateFile (TEXT ("f5.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_FLAG_OVERLAPPED | FILE_FLAG_WRITE_THROUGH, 0), wait = CreateEvent (0, false, true, 0); LARGE_INTEGER f5_start, f5_end; QueryPerformanceCounter (&f5_start); for (int i = 0 ; i < c_iterations ; ++i) { my_fwrite (&i, sizeof i, 1, f5, wait); } if (write_ptr) { WaitForSingleObject (wait, INFINITE); overlapped.Offset = write_offset & 0xffffffff; overlapped.OffsetHigh = write_offset >> 32; overlapped.hEvent = wait; WriteFile (f5, current_buffer, write_ptr, 0, &overlapped); WaitForSingleObject (wait, INFINITE); write_ptr = 0; } CloseHandle (f5); QueryPerformanceCounter (&f5_end); CloseHandle (wait); LARGE_INTEGER freq; QueryPerformanceFrequency (&freq); printf (" fwrites without buffering = %dms\n", (1000 * (f1_end.QuadPart - f1_start.QuadPart)) / freq.QuadPart); printf (" fwrites with buffering = %dms\n", (1000 * (f2_end.QuadPart - f2_start.QuadPart)) / freq.QuadPart); printf (" Win32 without buffering = %dms\n", (1000 * (f3_end.QuadPart - f3_start.QuadPart)) / freq.QuadPart); printf (" Win32 with buffering = %dms\n", (1000 * (f4_end.QuadPart - f4_start.QuadPart)) / freq.QuadPart); printf ("Win32 with double buffering = %dms\n", (1000 * (f5_end.QuadPart - f5_start.QuadPart)) / freq.QuadPart); }

+2

Skizz Nov 28 '12 at 10:04

source share

Primarily: small fwrites () slower because each fwrite needs to check the validity of its parameters, execute the equivalent of flockfile (), possibly fflush (), add data, return success: this overhead adds up - not as much as tiny calls to write (2), but that's it still noticeable.

Evidence:

 #include <stdio.h> #include <stdlib.h> static void w(const void *buf, size_t nbytes) { size_t n; if(!nbytes) return; n = fwrite(buf, 1, nbytes, stdout); if(n >= nbytes) return; if(!n) { perror("stdout"); exit(111); } w(buf+n, nbytes-n); } /* Usage: time $0 <$bigfile >/dev/null */ int main(int argc, char *argv[]) { char buf[32*1024]; size_t sz; sz = atoi(argv[1]); if(sz > sizeof(buf)) return 111; if(sz == 0) sz = sizeof(buf); for(;;) { size_t r = fread(buf, 1, sz, stdin); if(r < 1) break; w(buf, r); } return 0; }

Having said that, you can do what many commentators have suggested, i.e. add your own buffering before fwrite: this is very trivial code, but you should check to see if it really benefits you.

If you do not want to roll back your own, you can use, for example, the buffer interface in skalibs , but you will probably take longer to read documents than to write yourself (imho).

0

loreb Nov 27 '12 at 22:50

source share

You just need to collapse your own buffer. but, fortunately, standard C ++ has what you ask for. Just use std :: ofstream:

 //open and init char mybuffer [1024]; std::ofstream filestr("yourfile"); filestr.rdbuf()->pubsetbuf(mybuffer,1024); // write your data filestr.write(data,datasize);

Edited: error, using stream, not fstream, because it is not clear from the standard witch buffer (input or output?)

-1

geekpp Nov 27 '12 at 16:30

source share

The point of the FILE * layer in stdio is that it performs buffering for you. This will save you from the overhead of the system. As others have noted, one thing that can still be a problem is library overhead, which is significantly less. Another thing that can bite you is that you write many different places on the disk at the same time. (The discs rotate and the head takes the 8ms ball to get to the right place for random recording.)

If you determine that a problem with calling the library is a problem, I would recommend flipping your own trivial buffering using a vector and periodically flushing the vector for files.

If the problem is that you have a lot of emails scattered across the disk, try increasing the size of the buffer with setvbuf (). If you can do this, try a number around 4 MB.

-1

tmyklebu Nov 27 '12 at 16:59

source share

NPE · Accepted Answer · 2012-11-27T16:21:16+0000

First of all, fwrite() is a library, not a system call. Secondly, it already buffers the data.

You might want to experiment with increasing the size of the buffer. This is done using setvbuf() . On my system, this only helps a little, but YMMV.

If setvbuf() does not help, you can do your own buffering and just call fwrite() after you have accumulated enough data. This requires more work, but it will almost certainly speed up writing, as your own buffering can be made much easier than fwrite() .

edit:. If someone tells you that this is the number of fwrite() calls, which is the problem, ask for evidence. Better yet, do your own performance tests. On my computer, 500,000,000 double-byte entries using fwrite() take 11 seconds. This corresponds to a bandwidth of about 90 MB / s.

And last but not least, the huge discrepancy between 11 seconds in my test and the one hour mentioned in your question suggests that something else is happening in your code, which leads to very poor performance.

Fwrite performance for massive number of small records

More articles: