Why is my program slow? How can I increase the efficiency?

I have a program that blocks the union of nested loops ( link text ). Basically, what he does is he reads the contents from a file (for example, a 10 GB file) into buffer 1 (say 400 MB), puts it in a hash table. Now read the contents of the second file (say, a 10 GB file) into buffer 2 (say, 100 MB) and see if there are any elements in buffer2 in the hash. The output of the result does not matter. At the moment, I'm just interested in the effectiveness of the program. In this program, I need to read 8 bytes at a time from both files, so I use long long int. The problem is that my program is very inefficient. How can I make it effective?

// I compile with g++ -o hash hash.c -std=c++0x

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/time.h> #include <stdint.h> #include <math.h> #include <limits.h> #include <iostream> #include <algorithm> #include <vector> #include <unordered_map> using namespace std; typedef std::unordered_map<unsigned long long int, unsigned long long int> Mymap; int main() { uint64_t block_size1 = (400*1024*1024)/sizeof(long long int); //block size of Table A - division operator used to make the block size 1 mb - refer line 26,27 malloc statements. uint64_t block_size2 = (100*1024*1024)/sizeof(long long int); //block size of table B int i=0,j=0, k=0; uint64_t x,z,l=0; unsigned long long int *buffer1 = (unsigned long long int *)malloc(block_size1 * sizeof(long long int)); unsigned long long int *buffer2 = (unsigned long long int *)malloc(block_size2 * sizeof(long long int)); Mymap c1 ; // Hash table //Mymap::iterator it; FILE *file1 = fopen64("10G1.bin","rb"); // Input is a binary file of 10 GB FILE *file2 = fopen64("10G2.bin","rb"); printf("size of buffer1 : %llu \n", block_size1 * sizeof(long long int)); printf("size of buffer2 : %llu \n", block_size2 * sizeof(long long int)); while(!feof(file1)) { k++; printf("Iterations completed : %d \n",k); fread(buffer1, sizeof(long long int), block_size1, file1); // Reading the contents into the memory block from first file for ( x=0;x< block_size1;x++) c1.insert(Mymap::value_type(buffer1[x], x)); // inserting values into the hash table // std::cout << "The size of the hash table is" << c1.size() * sizeof(Mymap::value_type) << "\n" << endl; /* // display contents of the hash table for (Mymap::const_iterator it = c1.begin();it != c1.end(); ++it) std::cout << " [" << it->first << ", " << it->second << "]"; std::cout << std::endl; */ while(!feof(file2)) { i++; // Counting the number of iterations // printf("%d\n",i); fread(buffer2, sizeof(long long int), block_size2, file2); // Reading the contents into the memory block from second file for ( z=0;z< block_size2;z++) c1.find(buffer2[z]); // finding the element in hash table // if((c1.find(buffer2[z]) != c1.end()) == true) //To check the correctness of the code // l++; // printf("The number of elements equal are : %llu\n",l); // If input files have exactly same contents "l" should print out the block_size2 // l=0; } rewind(file2); c1.clear(); //clear the contents of the hash table } free(buffer1); free(buffer2); fclose(file1); fclose(file2); } 

Update:

Is it possible to directly read a fragment (say 400 MB) from a file and directly put it into a hash table by reading C ++ streams? I think this can further reduce overhead.

+6
c ++ performance
source share
6 answers

Running time of your program: (l 1 x bs 1 xl 2 x bs 2 ) (where l 1 is the number of lines in the first file, and bs 1 is the block size for the first buffer, and l 2 is the number of lines in the second file, and bs 2 is the block size for the second buffer), since you have four nested loops. Since your block sizes are constant, you can say that your order is O (nx 400 xmx 400) or O (1600mn), or in the worst case O (1600n 2 ), which essentially ends with O (n 2 ).

You may have an O (n) algorithm if you are doing something like this (pseudo code follows):

 map = new Map(); duplicate = new List(); unique = new List(); for each line in file1 map.put(line, true) end for for each line in file2 if(map.get(line)) duplicate.add(line) else unique.add(line) fi end for 

Now duplicate will contain a list of duplicate elements, and unique will contain a list of unique elements.

In your original algorithm, you unnecessarily go through the second file for each line in the first file. This way you actually lose the hash advantage (which gives you O (1) lookup time). The trade-off in this case, of course, is that you need to store all 10 GB in memory, which is probably not so useful . Typically, in such cases, a trade-off between runtime and memory.

There is probably a better way to do this. I need to think about this a little more. If not, I'm sure someone will come up with a better idea :).

UPDATE

You can probably reduce memory usage if you can find a good way to hash the line (which you are reading from the first file) to get a unique value (i.e. 1 to 1 mapping between line and hash value). Essentially, you would do something like this:

 for each line in file1 map.put(hash(line), true) end for for each line in file2 if(map.get(hash(line))) duplicate.add(line) else unique.add(line) fi end for 

Here, the hash function is a function that performs hashing. This way you do not need to store all the lines in memory. You only need to store your hashed values. This may help you a bit. Even in the worst case (where you either compare two files that are identical or completely different), you can still get 10 GB in memory for a duplicate or unique list. You can deal with it with the loss of some information if you simply keep the number of unique or repeating elements instead of the elements themselves.

+2
source share

If you are using fread, try using setvbuf () . The default buffers used by the standard lib file I / O calls are tiny (often around 4 KB). With fast processing of large amounts of data, you will be tied to input / output, and the overhead of getting a large number of small data buffers can become a significant bottleneck. Set this to a larger size (for example, 64 KB or 256 KB), and you can reduce this overhead and see significant improvements - try a few values ​​to see where you get maximum profit, as you get decreasing profits.

+3
source share

long long int *ptr = mmap() your files and then compare them with memcmp () in pieces. Once a mismatch is found, discard one piece and compare them in more detail. (In more detail, this means long long int in this case.)

If you often find discrepancies, don't bother memcmp (), just write your own loop comparing long long ints to each other.

+1
source share

The only way to find out is to profile it, for example, gprof . Create a guideline for your current implementation, and then experiment with other modifications methodically and run the test.

0
source share

I bet if you read in large pieces, you will get better performance. fread () and process multiple blocks per pass.

0
source share

The problem that I see is that you are reading the second file n times. So slow.

The best way to do this faster is to pre-sort the files and then Sort the merge . A variety is almost always worth it, in my experience.

0
source share

All Articles