Is there a faster write method than fseek and fwrite?

I have a 1GB binary that basically contains a 3D cube of the same type of values. Saving such a cube with a different order ([x, y, z] or [zx, y]) takes a lot of time with fseek and fwrite. But one of the software packages does it much faster than my program. Is there any approach for writing files faster than one with fseek / fwrite?

+4
source share
3 answers

You should not use fseek in the io inner loop of operations. To make recording functions fast, they cache records. If you search everywhere, you continue to blow into the cache.

Perform all of your in-memory conversions β€” for example, put a cube in memory, and then write the file to several sequential fwrite calls.

If you cannot completely convert your data into memory, then collect a cube one plane at a time in memory and write out each plane.

@edit:

In your case, you do not want to use fseek at all. Not even one.

Do something like this:

void writeCubeZYX( int* cubeXYZ, int sizeOfCubeXYZ, FILE* file ) { int* cubeZYX = malloc( sizeOfCubeXYZ ); // all that monkey business you're doing with fseek is done inside this // function copying memory to memory. No file IO operations in here. transformCubeXYZ_to_ZYX( cubeXYZ, cubeZYX, sizeOfCubeXYZ ); // one big fat very fast fwrite. Optimal use of file io cache. fwrite( file, cubeZYX, 1, sizeOfCubeXYZ ); free( cubeZYX ); // quiet pedantry. } 

@ edit2:

Well, suppose you cannot convert all of this into memory, and then convert it to a plane and write one plane at a time - in file order - without fseeks.

So, the cube [XYZ] is laid out in memory as a series of matrices Z [XY]. That is, the [XY] planes of your cube are adjacent in memory. And you want to record as [ZYX]. Thus, in the file you want to write a series of X [ZY] matrices. Each [ZY] will be contiguous in the file.

So you are doing something like this:

 void writeCubeZYX( int* cubeXYZ, int x, int y, int z, FILE* file ) { int sizeOfPlaneZY = sizeof( int ) * y * z; int* planeZY = malloc( sizeOfPlaneZY ); for ( int i = 0; i < X; i++ ) { // all that monkey business you're doing with fseek is done inside this // function extracting one ZY plane at a time. No file IO operations in here. extractZYPlane_form_CubeXYZ( cubeXYZ, planeZY, i ); // in X big fat very fast fwrites. Near optimal use of file io cache. fwrite( file, planeZY, 1, sizeOfPlaneZY ); } free( planeZY ); // quiet pedantry. } 
+7
source

If you make a lot of random access entries. I suggest you use mmap. mmap maps memory pages to your file and is managed by the OS. Like a memory exchange mechanism.

Another way is to use Asynchronous IO. It is provided by GLIBC http://www.gnu.org/software/libc/manual/html_node/Asynchronous-I_002fO.html

It simply queues the data into memory and then creates another thread to control the IO.

+1
source

If you do not mind that the file on disk is a compressed file, then you can compress it faster when you write it. This speeds up the work because the bottleneck usually writes bytes to disk, and when compressing it while writing, you reduce the number of bytes that need to be written.

This, of course, depends on whether your data is compressible. One option for compressing output in C ++ is gzip. For example: How to read / write gzip files?

But in your case, this may not be applicable. It is not clear from your question exactly when / why you are looking. What is your expected record structure?

0
source

All Articles