How to parallelize read lines from an input file when lines are processed independently?

I just started with OpenMP using C ++. My serial code in C ++ looks something like this:

#include <iostream> #include <string> #include <sstream> #include <vector> #include <fstream> #include <stdlib.h> int main(int argc, char* argv[]) { string line; std::ifstream inputfile(argv[1]); if(inputfile.is_open()) { while(getline(inputfile, line)) { // Line gets processed and written into an output file } } } 

Since each line is processed almost independently, I tried to use OpenMP to parallelize this, because the input file is in gigabyte order. Therefore, I assume that I first need to get the number of lines in the input file, and then parallelize the code this way. Can someone please help me here?

 #include <iostream> #include <string> #include <sstream> #include <vector> #include <fstream> #include <stdlib.h> #ifdef _OPENMP #include <omp.h> #endif int main(int argc, char* argv[]) { string line; std::ifstream inputfile(argv[1]); if(inputfile.is_open()) { //Calculate number of lines in file? //Set an output filename and open an ofstream #pragma omp parallel num_threads(8) { #pragma omp for schedule(dynamic, 1000) for(int i = 0; i < lines_in_file; i++) { //What do I do here? I cannot just read any line because it requires random access } } } } 

EDIT:

Important things

  • Each line is processed independently.
  • The order of the results does not matter.
+6
c ++ parallel-processing openmp
source share
1 answer

Not a direct answer from OpenMP - but you're probably looking for Map / Reduce . Take a look at Hadoop - it is done in Java, but there is at least a C ++ API.

In general, you want to process this amount of data on different machines, and not in several threads of the same process (virtual address space limitations, lack of physical memory, swapping, etc.). In addition, the kernel will have to bring the disk in any case (what you need - otherwise the hard disk just has to make additional images for each of your threads).

+2
source share

All Articles