I think that the way to measure overhead is the time of both serial and parallel versions, and then see how far from the parallel version from the "ideal" run time for your number of threads.
So, for example, if your serial version takes 10 seconds and you have 4 threads on 4 cores, then your ideal runtime is 2.5 seconds. If your version of OpenMP takes 4 seconds, your invoice is 1.5 seconds. I am doing overhead in quotes because some of them will create threads and memory sharing (actual overheads), and some of them will be just non-parallel sections of code. I am trying to think here in terms of the Amdahl Act .
Here are two examples to demonstrate. They do not measure the cost of creating threads, but they can show the difference between the expected and achieved improvement. And although Mystic was right that the only real way to measure is time, even trivial examples like your for loop are not necessarily related to memory. OpenMP does a great job that we donβt see.
Serial (speedtest.cpp)
Parallel (omp_speedtest.cpp)
#include <omp.h>
So I compiled them with
g++ -O3 -o speedtest.exe speedtest.cpp g++ -fopenmp -O3 -o omp_speedtest.exe omp_speedtest.cpp
And when I ran them
$ time ./speedtest.exe a[99999999]=0 a[99999999]=1 real 0m1.379s user 0m0.015s sys 0m0.000s $ time ./omp_speedtest.exe There are 4 procs. a[99999999]=0 a[99999999]=1 real 0m0.854s user 0m0.015s sys 0m0.015s
source share