I do not agree with @talonmies answer. cuCtxCreate () can be called from any process associated with a specific device, although it is loosely coupled to threads. In general, OS threads can change their CUDA contexts using the CUDA driver API, so they would not seem to be permanently attached to the thread or process ID. Process threads themselves make CUDA API calls using any cuda context object that is βcurrentβ to them, and sometimes it is even quietly initialized for the CUDA client, according to CUDA docs, see here .
There must be a way to share the CUDA context between multiple processes, because this is what CUDA MPS does - a single server contains a CUDA context for multiple CUDA clients. You can write your own CUDA MPS using LD_PRELOAD to intercept CUDA driver API calls (see CUDA code examples, cuHook) and place the CUDA context object somewhere in shared memory between all of your CUDA client processes. It has been documented that CUDA contexts are thread-safe, so blocking mechanisms may not even be necessary unless you want to force an external order. Actual versions of the CUDA driver API can be wrapped in calls to cuCtxPushCurrent () and cuCtxPopCurrent () so that your client code always safely uses the common global CUDA context.
source share