A faster way to create tab delimited text files?

Many of my programs output huge amounts of data for viewing in Excel. The best way to view all of these files is to use a tab-delimited text format. I am currently using this piece of code to do this:

ofstream output (fileName.c_str()); for (int j = 0; j < dim; j++) { for (int i = 0; i < dim; i++) output << arrayPointer[j * dim + i] << " "; output << endl; } 

This seems to be a very slow operation, is it a more efficient way to output text files like this to your hard drive?

Update:

Given the two sentences, the new code:

 ofstream output (fileName.c_str()); for (int j = 0; j < dim; j++) { for (int i = 0; i < dim; i++) output << arrayPointer[j * dim + i] << "\t"; output << "\n"; } output.close(); 

Records on HD at 500 Kbps

But it is recorded in HD at a speed of 50 MB / s

 { output.open(fileName.c_str(), std::ios::binary | std::ios::out); output.write(reinterpret_cast<char*>(arrayPointer), std::streamsize(dim * dim * sizeof(double))); output.close(); } 
+6
c ++
source share
6 answers

Use C IO, it is much faster than C ++ IO. I have heard that people involved in programming fail solely because they used C ++ IO and not C IO.

 #include <cstdio> FILE* fout = fopen(fileName.c_str(), "w"); for (int j = 0; j < dim; j++) { for (int i = 0; i < dim; i++) fprintf(fout, "%d\t", arrayPointer[j * dim + i]); fprintf(fout, "\n"); } fclose(fout); 

Just change %d to the correct type.

+6
source share

Do not use endl. It will flush stream buffers, which is potentially very inefficient. Instead of this:

 output << '\n'; 
+3
source share

I decided to test the JPvdMerwe application that C stdio is faster than C ++ I / O streams. (Spoiler: yes, but not necessarily a lot.) For this, I used the following test programs:

Generic shell code omitted from the following programs:

 #include <iostream> #include <cstdio> int main (void) { // program code goes here } 

Program 1: C ++ Ordinary Synchronized I / O Streams

 for (int j = 0; j < ROWS; j++) { for (int i = 0; i < COLS; i++) { std::cout << (ij) << "\t"; } std::cout << "\n"; } 

Program 2: unsynchronized C ++ I / O streams

Same as for program 1, except for std::cout.sync_with_stdio(false); prepended.

Program 3: C stdio printf ()

 for (int j = 0; j < ROWS; j++) { for (int i = 0; i < COLS; i++) { printf("%d\t", ij); } printf("\n"); } 

All programs were compiled with GCC 4.8.4 on Ubuntu Linux using the following command:

 g++ -Wall -ansi -pedantic -DROWS=10000 -DCOLS=1000 prog.cpp -o prog 

and timed with the command:

 time ./prog > /dev/null 

Below are the test results on my laptop (measured in wall clock mode):

  • Program 1 (Synchronized C ++ IO): 3.350s (= 100%)
  • Program 2 (unsynchronized C ++ IO): 3.072s (= 92%)
  • Program 3 (C stdio): 2.592s (= 77%)

I also conducted the same test with g++ -O2 to check the optimization effect and got the following results:

  • Program 1 (synchronized C ++ IO) with -O2 : 3.118s (= 100%)
  • Program 2 (unsynchronized I + C ++) with -O2 : 2.943s (= 94%)
  • Program 3 (C stdio) with -O2 : 2.734s (= 88%)

(The last line is not an accident, program 3 sequentially runs slower for me with -O2 than without it!)

So my conclusion is that, based on this test, C stdio is really 10-25% faster for this task than (synchronized) C ++ IO. Using unsynchronized C ++ I / O saves about 5% -10% compared to synchronized IO, but is still slower than stdio.


Ps. I tried a few more options:

  • Using std::endl instead of "\n" , as expected, a bit slower, but the difference is less than 5% for the above parameter values. However, printing shorter output lines (for example, -DROWS=1000000 -DCOLS=10 ) makes std::endl more than 30% slower than "\n" .

  • Migrating output to a regular file instead of /dev/null slows down all programs by about 0.2 s, but does not make a qualitative difference with the results.

  • An increase in the number of lines by 10 times also does not give surprises; all programs take 10 times longer than expected.

  • Transformation std::cout.sync_with_stdio(false); program 3 has no noticeable effect.

  • Using (double)(ij) (and "%g\t" for printf() ) slows down all three programs! It is noteworthy that program 3 is still faster, only 9.3 seconds, where programs 1 and 2 took a little more than 14 seconds, and the acceleration was almost 40%! (And yes, I checked, the outputs are identical.) Using -O2 also has no significant differences.

+2
source share

Do I need to write it in C? if not, there are many tools already written in C, for example (g) awk (can be used on unix / windows), which does a great job of parsing files, as well as large files.

 awk '{$1=$1}1' OFS="\t" file 
+1
source share

This can be faster done as follows:

 ofstream output (fileName.c_str()); for (int j = 0; j < dim; j++) { for (int i = 0; i < dim; i++) output << arrayPointer[j * dim + i] << '\t'; output << '\n'; } 
+1
source share
 ofstream output (fileName.c_str()); for (int j = 0; j < dim; j++) { for (int i = 0; i < dim; i++) output << arrayPointer[j * dim + i] << '\t'; output << endl; } 

Use '\ t' instead of ""

0
source share

All Articles