Signal Processing in the OpenMP Parallel Program

Question

Signal Processing in the OpenMP Parallel Program

I have a program that uses a POSIX timer ( timer_create() ). In essence, the program sets a timer and begins to perform several long (potentially endless) calculations. When the timer expires and the signal handler is called, the handler prints the best result that has yet been computed and completed.

I am considering parallel computing using OpenMP because it should speed it up.

There are special functions in pthreads, for example, for setting signal masks for my threads. Does OpenMP provide such control, or should I accept the fact that the signal can be delivered to any of the threads created by OpenMP?

Also, if I'm currently in a parallel section of my code and my handler is called, can it safely kill the application ( exit(0); ) and do things like blocking OpenMP locks?

+8

c pthreads signals openmp signal-handling

user7610 Nov 15 '11 at 14:47

source share

2 answers

It's a bit late, but hopefully this code example will help others in a similar situation!

As mentioned in osgx, OpenMP says nothing about the signal problem, but since OpenMP is often implemented using pthreads on POSIX systems, we can use the pthread signal approach.

For heavy computing using OpenMP, it is likely that there are only a few places where computing can be safely stopped. Therefore, for the case when you want to get premature results, we can use synchronous signal processing to safely do this. An additional advantage is that it allows us to receive a signal from a specific OpenMP stream (in the code example below, we select the main stream). When a signal is detected, we simply set a flag indicating that the calculation should be stopped. Then, each thread should periodically check this flag when it is convenient, and then wrap its share of the workload.

Using this synchronous approach, we allow the calculations to finish gracefully and with minimal changes to the algorithm. On the other hand, an approach to a signal processor may be inappropriate at will, since it will probably be difficult to correlate the current operating states of each stream into a consistent result. However, one of the drawbacks of the synchronous approach is that it can take a considerable amount of time to stop.

The signal verification device consists of three parts:

Blocking relevant signals. This must be done outside the omp parallel area of the omp parallel so that every OpenMP thread (pthread) inherits the same blocking behavior.
Poll for the necessary signals from the master stream. You can use sigtimedwait for this, but some systems (such as MacOS) do not support this. Moreover, we can use sigpending to poll any blocked signals, and then double-check that the blocked signals are what we expect before receiving them synchronously using sigwait (which should return immediately if some other part the program does not create a race condition). We finally set the appropriate flag.
We need to remove our signal mask at the end (perhaps with one last check for signals).

There are several important performance considerations and caveats:

Assuming that each iteration of the inner loop is small, making system call signal verification calls is expensive. In the code example, we check the signals only every 10 million (per stream) iterations, which corresponds, perhaps, to a couple of seconds of time on the wall.
omp for loops cannot be broken into ¹ so you must either rotate until the end of the iterations or rewrite the loop using simpler OpenMP primitives. Regular loops (such as inner loops of an outer parallel loop) can be broken out very well.
If only the main thread can check the signals, then this can create a problem in programs where the main thread ends long before other threads. In this case, these other flows will be uninterrupted. To solve this problem, you could “relay the baton” of signal checks when each thread completed its workload, or the master thread could continue to work and interrogate until all other threads completed ² .
On some architectures, such as HPC NUMA, the time to check the “global” signal flag can be quite expensive, so be careful when deciding when and where to check or manipulate this flag. For example, a spin cycle section might need to locally cache a flag when it becomes true.

Here is a sample code:

 #include <signal.h> void calculate() { _Bool signalled = false; int sigcaught; size_t steps_tot = 0; // block signals of interest (SIGINT and SIGTERM here) sigset_t oldmask, newmask, sigpend; sigemptyset(&newmask); sigaddset(&newmask, SIGINT); sigaddset(&newmask, SIGTERM); sigprocmask(SIG_BLOCK, &newmask, &oldmask); #pragma omp parallel { int rank = omp_get_thread_num(); size_t steps = 0; // keep improving result forever, unless signalled while (!signalled) { #pragma omp for for (size_t i = 0; i < 10000; i++) { // we can't break from an omp for loop... // instead, spin away the rest of the iterations if (signalled) continue; for (size_t j = 0; j < 1000000; j++, steps++) { // *** // heavy computation... // *** // check for signal every 10 million steps if (steps % 10000000 == 0) { // master thread; poll for signal if (rank == 0) { sigpending(&sigpend); if (sigismember(&sigpend, SIGINT) || sigismember(&sigpend, SIGTERM)) { if (sigwait(&newmask, &sigcaught) == 0) { printf("Interrupted by %d...\n", sigcaught); signalled = true; } } } // all threads; stop computing if (signalled) break; } } } } #pragma omp atomic steps_tot += steps; } printf("The result is ... after %zu steps\n", steps_tot); // optional cleanup sigprocmask(SIG_SETMASK, &oldmask, NULL); }

If you use C ++, you may find the following class useful ...

 #include <signal.h> #include <vector> class Unterminable { sigset_t oldmask, newmask; std::vector<int> signals; public: Unterminable(std::vector<int> signals) : signals(signals) { sigemptyset(&newmask); for (int signal : signals) sigaddset(&newmask, signal); sigprocmask(SIG_BLOCK, &newmask, &oldmask); } Unterminable() : Unterminable({SIGINT, SIGTERM}) {} // this can be made more efficient by using sigandset, // but sigandset is not particularly portable int poll() { sigset_t sigpend; sigpending(&sigpend); for (int signal : signals) { if (sigismember(&sigpend, signal)) { int sigret; if (sigwait(&newmask, &sigret) == 0) return sigret; break; } } return -1; } ~Unterminable() { sigprocmask(SIG_SETMASK, &oldmask, NULL); } };

Then the blocking part of the Unterminable unterm(); function Unterminable unterm(); calculate() can be replaced by Unterminable unterm(); and part of checking the signal with if ((sigcaught = unterm.poll()) > 0) {...} . Signals are unlocked automatically when unterm goes out of scope.

^{¹ This is not entirely true.} ^{OpenMP supports limited support for doing "parallel breaks" in the form of undo points .} ^{If you decide to use undo points in your parallel loops, make sure you know exactly where the implicit undo points are to ensure that your calculated data will be consistent when undoing.}

^{² Personally, I count how many threads the for loop completed, and if the main thread completes the loop without intercepting the signal, it continues to poll signals until it either intercepts the signal or all the threads complete the loop.} ^{To do this, be sure to check the for nowait .}

0

nitrous May 01 '19 at 12:22

source share

osgx · Accepted Answer · 2011-11-16T18:44:59+0000

The OpenMP 3.1 standard says nothing about signals.

As I know, every popular Linux / UNIX OpenMP implementation is based on pthreads, so the OpenMP thread is a pthread thread. And the general rules for pthreads and signals apply.

Does OpenMP Provide Such Control

There is no specific control; but you can try using pthread control. The only problem is to know how many OpenMP threads are used and where to place the control instruction.

Can a signal be delivered to any of the threads created by OpenMP?

By default, yes, it will be delivered to any stream.

my handler is called,

The usual rules for signal handlers are still applicable. The functions allowed in the signal handler are listed at http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html (at the end of the page)

And printf not allowed ( write is). You can use printf if you know that printf is not used by the thread at the time of the signal (for example, you do not have printf in the parallel area).

can he safely kill the application (exit (0);)

Yes, it can: abort() and _exit() allowed from the handler.

Linux / Unix terminates all threads when any thread executes exit or abort .

and do things like blocking OpenMP locks?

It should not, but if you know that this lock will not be blocked during the start of the signal handler, you can try to do this.

!! UPDATE

There is an example of accepting OpenMP signaling http://www.cs.colostate.edu/~cs675/OpenMPvsThreads.pdf ("OpenMP vs Threading in C / C ++"). In short: set the flag in the handler and add checks for this flag in each thread on each iteration of the Nth loop.

Adapting a signal-based exclusion mechanism to a parallel domain
Something that happens with C / C ++ applications that are with Fortran applications - this program uses a complex user interface. Genehunter is a simple example where a user can interrupt the calculation of one family tree by pressing the control-C button so that he can go to the next family tree in the clinical disease database. Premature termination of processing the serial version using an exception in the C ++ mechanism, including a signal handler, setjump, and longjump.OpenMP does not allow unstructured control for a stream that crosses a parallel construction boundary. We changed exception handling in OpenMP by changing the interrupt handler to a polling mechanism. A thread that catches the control-C signal sets a common flag. For all topics, check the flag at the beginning of the loop for calling the has_hit_interrupt () function and skip the iteration if it is set. When the loop ends, the wizard checks the flag and can easily perform a longjump to complete an exceptional exit (see Figure 1).

Signal Processing in the OpenMP Parallel Program

More articles: