OpenMP Overhead Calculation

Given n threads, is there a way I can calculate the amount of overhead (for example, the number of loops) needed to implement a particular directive in OpenMP.

For example, given the code below

#pragma omp parallel { #pragma omp for for( int i=0 ; i < m ; i++ ) a[i] = b[i] + c[i]; } 

Is there any way to calculate how much overhead is required to create these flows?

+4
source share
2 answers

Yes, you can. Check out the EPCC test . Although this code is slightly older, it measures various overheads of the OpenMP constructs, including omp parallel for and omp critical .

The basic approach is somewhat simple and simple. You measure the base serial time without OpenMP and simply turn on the OpenMP pragma that you want to measure. Then subtract the elapsed time. This is just like the EPCC benchmark measures overhead. See Source as syncbench.c.

Note that overhead is expressed as time, not number of cycles. I also tried to measure the number of loops, but the OpenMP overhead of parallel constructs may include locked time due to synchronization. Therefore, # loops may not reflect the real costs of OpenMP.

+4
source

I think that the way to measure overhead is the time of both serial and parallel versions, and then see how far from the parallel version from the "ideal" run time for your number of threads.

So, for example, if your serial version takes 10 seconds and you have 4 threads on 4 cores, then your ideal runtime is 2.5 seconds. If your version of OpenMP takes 4 seconds, your invoice is 1.5 seconds. I am doing overhead in quotes because some of them will create threads and memory sharing (actual overheads), and some of them will be just non-parallel sections of code. I am trying to think here in terms of the Amdahl Act .

Here are two examples to demonstrate. They do not measure the cost of creating threads, but they can show the difference between the expected and achieved improvement. And although Mystic was right that the only real way to measure is time, even trivial examples like your for loop are not necessarily related to memory. OpenMP does a great job that we don’t see.

Serial (speedtest.cpp)

 #include <iostream> int main(int argc, char** argv) { const int SIZE = 100000000; int* a = new int[SIZE]; int* b = new int[SIZE]; int* c = new int[SIZE]; for(int i = 0; i < SIZE; i++) { a[i] = b[i] * c[i] * 2; } std::cout << "a[" << (SIZE-1) << "]=" << a[SIZE-1] << std::endl; for(int i = 0; i < SIZE; i++) { a[i] = b[i] + c[i] + 1; } std::cout << "a[" << (SIZE-1) << "]=" << a[SIZE-1] << std::endl; delete[] a; delete[] b; delete[] c; return 0; } 

Parallel (omp_speedtest.cpp)

 #include <omp.h> #include <iostream> int main(int argc, char** argv) { const int SIZE = 100000000; int* a = new int[SIZE]; int* b = new int[SIZE]; int* c = new int[SIZE]; std::cout << "There are " << omp_get_num_procs() << " procs." << std::endl; #pragma omp parallel { #pragma omp for for(int i = 0; i < SIZE; i++) { a[i] = b[i] * c[i]; } } std::cout << "a[" << (SIZE-1) << "]=" << a[SIZE-1] << std::endl; #pragma omp parallel { #pragma omp for for(int i = 0; i < SIZE; i++) { a[i] = b[i] + c[i] + 1; } } std::cout << "a[" << (SIZE-1) << "]=" << a[SIZE-1] << std::endl; delete[] a; delete[] b; delete[] c; return 0; } 

So I compiled them with

 g++ -O3 -o speedtest.exe speedtest.cpp g++ -fopenmp -O3 -o omp_speedtest.exe omp_speedtest.cpp 

And when I ran them

 $ time ./speedtest.exe a[99999999]=0 a[99999999]=1 real 0m1.379s user 0m0.015s sys 0m0.000s $ time ./omp_speedtest.exe There are 4 procs. a[99999999]=0 a[99999999]=1 real 0m0.854s user 0m0.015s sys 0m0.015s 
+4
source

All Articles