OpenMP, for the inner loop

I would like to run the following code (below). I want to create two independent threads, each of which will start a parallel loop. Sorry, I am getting an error. Apparently, a parallel for cannot be spawned inside a section . How to solve this?

 #include <omp.h> #include "stdio.h" int main() { omp_set_num_threads(10); #pragma omp parallel #pragma omp sections { #pragma omp section #pragma omp for for(int i=0; i<5; i++) { printf("x %d\n", i); } #pragma omp section #pragma omp for for(int i=0; i<5; i++) { printf(". %d\n", i); } } // end parallel and end sections } 

And the error:

 main.cpp: In function 'int main()': main.cpp:14:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default] main.cpp:20:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default] 
+4
source share
3 answers

OpenMP cannot create parallel areas inside parallel areas. This is due to the fact that OpenMP creates parallel num_threads threads at the beginning of the program, while in non-parallel areas the rest are not used and do not sleep. They did this because the frequent generation of new flows is rather slow compared to the awakening of sleeping flows.

Therefore, you should only parallelize loops:

 #include <omp.h> #include "stdio.h" int main() { omp_set_num_threads(10); #pragma omp parallel for for(int i=0; i<5; i++) { printf("x %d\n", i); } #pragma omp parallel for for(int i=0; i<5; i++) { printf(". %d\n", i); } } 
+3
source

Here you should use nested parallelism. The problem with omp for in sections is that all threads in scope must be involved in omp for , and they obviously don't see it; they are divided into sections. Thus, you must introduce functions and perform nested parallelism within functions.

 #include <stdio.h> #include <omp.h> void doTask1(const int gtid) { omp_set_num_threads(5); #pragma omp parallel { int tid = omp_get_thread_num(); #pragma omp for for(int i=0; i<5; i++) { printf("x %d %d %d\n", i, tid, gtid); } } } void doTask2(const int gtid) { omp_set_num_threads(5); #pragma omp parallel { int tid = omp_get_thread_num(); #pragma omp for for(int i=0; i<5; i++) { printf(". %d %d %d\n", i, tid, gtid); } } } int main() { omp_set_num_threads(2); omp_set_nested(1); #pragma omp parallel { int gtid = omp_get_thread_num(); #pragma omp sections { #pragma omp section doTask1(gtid); #pragma omp section doTask2(gtid); } // end parallel and end sections } } 
+6
source

The almost optimal number of threads is equal to the number of processor cores available. Thus, each parallel for must be processed in all available kernels, which is impossible inside omp partitions. So what you are trying to achieve is not optimal. The tune2fs suggestion for running two loops without sections makes sense and gives the best performance. You can perform parallel loops inside other functions, but this "trick" does not provide a performance improvement.

+1
source

All Articles