It's a bit late, but hopefully this code example will help others in a similar situation!
As mentioned in osgx, OpenMP says nothing about the signal problem, but since OpenMP is often implemented using pthreads on POSIX systems, we can use the pthread signal approach.
For heavy computing using OpenMP, it is likely that there are only a few places where computing can be safely stopped. Therefore, for the case when you want to get premature results, we can use synchronous signal processing to safely do this. An additional advantage is that it allows us to receive a signal from a specific OpenMP stream (in the code example below, we select the main stream). When a signal is detected, we simply set a flag indicating that the calculation should be stopped. Then, each thread should periodically check this flag when it is convenient, and then wrap its share of the workload.
Using this synchronous approach, we allow the calculations to finish gracefully and with minimal changes to the algorithm. On the other hand, an approach to a signal processor may be inappropriate at will, since it will probably be difficult to correlate the current operating states of each stream into a consistent result. However, one of the drawbacks of the synchronous approach is that it can take a considerable amount of time to stop.
The signal verification device consists of three parts:
- Blocking relevant signals. This must be done outside the
omp parallel area of the omp parallel so that every OpenMP thread (pthread) inherits the same blocking behavior. - Poll for the necessary signals from the master stream. You can use
sigtimedwait for this, but some systems (such as MacOS) do not support this. Moreover, we can use sigpending to poll any blocked signals, and then double-check that the blocked signals are what we expect before receiving them synchronously using sigwait (which should return immediately if some other part the program does not create a race condition). We finally set the appropriate flag. - We need to remove our signal mask at the end (perhaps with one last check for signals).
There are several important performance considerations and caveats:
- Assuming that each iteration of the inner loop is small, making system call signal verification calls is expensive. In the code example, we check the signals only every 10 million (per stream) iterations, which corresponds, perhaps, to a couple of seconds of time on the wall.
omp for loops cannot be broken into 1 so you must either rotate until the end of the iterations or rewrite the loop using simpler OpenMP primitives. Regular loops (such as inner loops of an outer parallel loop) can be broken out very well.- If only the main thread can check the signals, then this can create a problem in programs where the main thread ends long before other threads. In this case, these other flows will be uninterrupted. To solve this problem, you could “relay the baton” of signal checks when each thread completed its workload, or the master thread could continue to work and interrogate until all other threads completed 2 .
- On some architectures, such as HPC NUMA, the time to check the “global” signal flag can be quite expensive, so be careful when deciding when and where to check or manipulate this flag. For example, a spin cycle section might need to locally cache a flag when it becomes true.
Here is a sample code:
#include <signal.h> void calculate() { _Bool signalled = false; int sigcaught; size_t steps_tot = 0; // block signals of interest (SIGINT and SIGTERM here) sigset_t oldmask, newmask, sigpend; sigemptyset(&newmask); sigaddset(&newmask, SIGINT); sigaddset(&newmask, SIGTERM); sigprocmask(SIG_BLOCK, &newmask, &oldmask); #pragma omp parallel { int rank = omp_get_thread_num(); size_t steps = 0; // keep improving result forever, unless signalled while (!signalled) { #pragma omp for for (size_t i = 0; i < 10000; i++) { // we can't break from an omp for loop... // instead, spin away the rest of the iterations if (signalled) continue; for (size_t j = 0; j < 1000000; j++, steps++) { // *** // heavy computation... // *** // check for signal every 10 million steps if (steps % 10000000 == 0) { // master thread; poll for signal if (rank == 0) { sigpending(&sigpend); if (sigismember(&sigpend, SIGINT) || sigismember(&sigpend, SIGTERM)) { if (sigwait(&newmask, &sigcaught) == 0) { printf("Interrupted by %d...\n", sigcaught); signalled = true; } } } // all threads; stop computing if (signalled) break; } } } } #pragma omp atomic steps_tot += steps; } printf("The result is ... after %zu steps\n", steps_tot); // optional cleanup sigprocmask(SIG_SETMASK, &oldmask, NULL); }
If you use C ++, you may find the following class useful ...
#include <signal.h> #include <vector> class Unterminable { sigset_t oldmask, newmask; std::vector<int> signals; public: Unterminable(std::vector<int> signals) : signals(signals) { sigemptyset(&newmask); for (int signal : signals) sigaddset(&newmask, signal); sigprocmask(SIG_BLOCK, &newmask, &oldmask); } Unterminable() : Unterminable({SIGINT, SIGTERM}) {} // this can be made more efficient by using sigandset, // but sigandset is not particularly portable int poll() { sigset_t sigpend; sigpending(&sigpend); for (int signal : signals) { if (sigismember(&sigpend, signal)) { int sigret; if (sigwait(&newmask, &sigret) == 0) return sigret; break; } } return -1; } ~Unterminable() { sigprocmask(SIG_SETMASK, &oldmask, NULL); } };
Then the blocking part of the Unterminable unterm(); function Unterminable unterm(); calculate() can be replaced by Unterminable unterm(); and part of checking the signal with if ((sigcaught = unterm.poll()) > 0) {...} . Signals are unlocked automatically when unterm goes out of scope.
1 This is not entirely true. OpenMP supports limited support for doing "parallel breaks" in the form of undo points . If you decide to use undo points in your parallel loops, make sure you know exactly where the implicit undo points are to ensure that your calculated data will be consistent when undoing.
2 Personally, I count how many threads the for loop completed, and if the main thread completes the loop without intercepting the signal, it continues to poll signals until it either intercepts the signal or all the threads complete the loop. To do this, be sure to check the for nowait .