OpenMP 4.0 - GCC 5.2.0 - Performing Overlap and Host Function

Question

OpenMP 4.0 - GCC 5.2.0 - Performing Overlap and Host Function

I am trying to test a very simple program that uses gcc 5 to unload through OpenMP 4.0 directives. My goal is to write two independent tasks with one task performed on the accelerator (i.e., Intel MIC Emulator), and the other simultaneously with the processor.

Here is the code:

#include <omp.h> #include <stdio.h> #define limit 100000 int main(int argc, char** argv) { int cpu_prime, acc_prime; #pragma omp task shared(acc_prime) { #pragma omp target map(tofrom: acc_prime) { printf("mjf-dbg >> acc computation\n"); int i, j; acc_prime=0; for(i=0; i<limit; i++){ for(j=2; j<=i; j++){ if(i%j==0) break; } if(j==i) acc_prime = i; } printf("mjf-dbg << acc computation\n"); } } #pragma omp task shared(cpu_prime) { int i, j; cpu_prime=0; printf("mjf-dbg >> cpu computation\n"); for(i=0; i<limit; i++){ for(j=2; j<=i; j++){ if(i%j==0) break; } if(j==i) cpu_prime = i; } printf("mjf-dbg << cpu computation\n"); } #pragma omp taskwait printf("cpu prime: %d \n", cpu_prime); printf("gpu prime: %d \n", acc_prime); }

With this code, I was expecting the following thread of execution:

A master thread (MT) encounters the first explicit area of tasks, becomes attached to this task, and begins its execution.
MT Target Directive Detection Unloads Target Block to Accelerator and Reaches Planning Point
MT will return to the area of implicit tasks
MT meets the second explicit area of the task, becomes attached to this task, and begins its execution.
MT performs the calculation on the node in parallel with the unloading of the calculator on the accelerator device.
MT returns to the implicit task area and reaches the planning point invoked by the taskwait directive
MT returns to the first explicit task pane, waiting for the end of the unloaded block.

Compile and run:

 gcc -fopenmp -foffload="-march=knl" overlap.c -o overlap OFFLOAD_EMUL_RUN="sde -knl --" ./overlap

Output:

 mjf-dbg >> acc computation mjf-dbg << acc computation mjf-dbg >> cpu computation mjf-dbg << cpu computation cpu prime: 99991 gpu prime: 99991

This is not the result that I expected, since it means that the main thread is waiting for the upload calculation to complete before scheduling the node task. Instead, I was looking for something like this:

 mjf-dbg >> acc computation mjf-dbg >> cpu computation mjf-dbg << cpu computation mjf-dbg << acc computation cpu prime: 99991 gpu prime: 99991

The unload emulator works correctly, because at runtime I see that the _offload_target process switches to 100% CPU usage when the program executes the target block.

So the question is: does anyone have an idea of why two tasks are serialized and not executed in parallel (one in the host process, and the other in the _offload_target emulation process)?

+4

c multithreading openmp hardware-acceleration xeon-phi

Majac89 Aug 19 '15 at 10:39

source share

1 answer

Jonathan dursi · Answer 1 · 2015-08-19T20:32:54+0000

Here is a more fundamental (and simpler) problem than unloading - your tasks are not in a parallel region.

OpenMP tasks should be in a parallel region , although they are usually built into omp single .

So this is:

 #include <stdio.h> #include <unistd.h> int main(int argc, char** argv) { #pragma omp task { printf("task 1 starts\n"); sleep(3); printf("task 1 ends\n"); } #pragma omp task { printf("task 2 starts\n"); sleep(1); printf("task 2 ends\n"); } return 0; }

Performs tasks in series:

 $ gcc -fopenmp brokentasks.c -o brokentasks $ export OMP_NUM_THREADS=2 $ ./brokentasks task 1 starts task 1 ends task 2 starts task 2 ends

While tasks in a parallel area look like this:

 #include <stdio.h> #include <unistd.h> int main(int argc, char** argv) { #pragma omp parallel #pragma omp single { #pragma omp task { printf("task 1 starts\n"); sleep(3); printf("task 1 ends\n"); } #pragma omp task { printf("task 2 starts\n"); sleep(1); printf("task 2 ends\n"); } } }

Works as expected

 $ gcc -fopenmp tasks.c -o tasks jdursi@odw-jdursi :~/tmp$ ./tasks task 2 starts task 1 starts task 2 ends task 1 ends

OpenMP 4.0 - GCC 5.2.0 - Performing Overlap and Host Function

More articles: