OpenMP task with and without parallel parallel

Node *head = &node1; while (head) { #pragma omp task cout<<head->value<<endl; head = head->next; } #pragma omp parallel { #pragma omp single { Node *head = &node1; while (head) { #pragma omp task cout<<head->value<<endl; head = head->next; } } } 

In the first block, I simply created tasks without a parallel directive, and in the second block I used a parallel directive and one directive, which is the usual way that I saw in newspapers. I wonder what is the difference between the two? By the way, I know the main meaning of these directives.

Code in my comment:

 void traverse(node *root) { if (root->left) { #pragma omp task traverse(root->left); } if (root->right) { #pragma omp task traverse(root->right); } process(root); } 
+7
source share
1 answer

The difference is that in the first block you do not create any tasks, since the block itself is not nested (neither syntactically nor lexically) inside the active parallel area. In the second block, the task construct is syntactically nested inside the parallel region and will queue explicit tasks if the region appears to be active at run time (the active parallel region is the one that is executed with a command from more than one thread). Lexical nesting is less obvious. Take a look at the following example:

 void foo(void) { int i; for (i = 0; i < 10; i++) #pragma omp task bar(); } int main(void) { foo(); #pragma omp parallel num_threads(4) { #pragma omp single foo(); } return 0; } 

The first call to foo() occurs outside of any parallel areas. Therefore, the task directive does (almost) nothing, and all calls to bar() occur in serial. The second call to foo() is from within the parallel scope, and therefore new tasks will be generated inside foo() . The parallel area is active, since the number of threads was fixed at 4 by the condition num_threads(4) .

This behavior of OpenMP directives is a design feature. The basic idea is to be able to write code that can be executed in serial or parallel.

However, having the task construct in foo() does some code conversion, for example. foo() converts to something like:

 void foo_omp_fn_1(void *omp_data) { bar(); } void foo(void) { int i; for (i = 0; i < 10; i++) OMP_make_task(foo_omp_fn_1, NULL); } 

Here OMP_make_task() is a hypothetical (not publicly available) function from the OpenMP support library, which queues up a call to the function presented as its first argument. If OMP_make_task() detects that it is operating outside the active parallel area, it simply calls foo_omp_fn_1() . This adds some overhead to calling bar() in the serial case. Instead of main -> foo -> bar call goes like main -> foo -> OMP_make_task -> foo_omp_fn_1 -> bar . The consequence of this is the slower execution of serial code.

This is further illustrated by the workharing directive:

 void foo(void) { int i; #pragma omp for for (i = 0; i < 12; i++) bar(); } int main(void) { foo(); #pragma omp parallel num_threads(4) { foo(); } return 0; } 

The first call to foo() will start the loop in sequential order. The second call would distribute 12 iterations among 4 threads, i.e. Each thread would execute only 3 iterators. Once again, some magic of code conversion is used for this, and the sequential loop will work more slowly than if #pragma omp for present in foo() .

The lesson here is to never add OpenMP constructors where they really aren't needed.

+12
source

All Articles