OpenMP Recursive Tasks

Consider the following program that calculates Fibonacci numbers.
It uses OpenMP tasks for parallelization.

#include <iostream> #include <omp.h> using namespace std; int fib(int n) { if(n == 0 || n == 1) return n; int res, a, b; #pragma omp parallel { #pragma omp single { #pragma omp task shared(a) a = fib(n-1); #pragma omp task shared(b) b = fib(n-2); #pragma omp taskwait res = a+b; } } return res; } int main() { cout << fib(40); } 

I am using gcc version 4.8.2 and Fedora 20.
When compiling the above program with g ++ -fopenmp name_of_program.cpp -Wall and running it, I see that when viewing htop only two (sometimes 3) streams work. The machine on which I run this program has 8 logical processors. My question is: what do I need to do to offload work on 8 threads. I tried to export OMP_NESTED = TRUE, but this leads to the following error when starting the program:
libgomp: stream creation failed: the resource is temporarily unavailable
The goal of my program is not to efficiently calculate Fibonacci numbers, but to use tasks or something similar in OpenMP.

+6
source share
1 answer

Using OMP_NESTED = FALSE, a thread team is assigned to a parallel top-level region and does not contain additional threads at each nested level, so no more than two threads will do useful work.

With OMP_NESTED = TRUE, a thread command is assigned at each level. Your system has 8 logical processors, so the size of the command is most likely 8. The command includes one thread from outside the region, so only 7 new threads are launched. The recursion tree for fib (n) has about fib (n) nodes. (A good self-referencing property of fib!) Thus, the code can create 7 * fib (n) threads that can quickly run out of resources.

The fix is ​​to use one parallel area around the entire task tree. Move the omp parallel and omp single logic to the main one, outside the feed. Thus, a single thread command will work on the entire task tree.

A common point is to distinguish the potential of parallelism from actual parallelism. The task directives indicate the potential of parallelism, which may or may not actually be used at run time. omp parallel (for all practical purposes) indicates actual parallelism. Typically, you want the actual parallelism to match the hardware available so as not to soak the machine, but the potential of parallelism will be much greater so that the runtime can balance the load.

+2
source

All Articles