Question 1: In this scenario, I am well versed in using one context and an explicit thread created in this context for each thread that runs the specified pipeline?
You should be fine with one context.
Question 2: Can someone enlighten me, which is the sole purpose of the CUDA device context? I suppose this makes sense in a multi-GPU scenario, but are there any cases where I would like to create multiple contexts for a single GPU?
The CUDA device context is discussed in the programming guide . It represents all the state (memory card, allocations, kernel definitions and other state-related information) associated with a particular process (i.e., associated with the use of this particular GPU process). Separate processes usually have separate contexts (like separate devices), since these processes have independent use of the GPU and independent memory cards.
If you are using GPU multiprocessing, you usually create several contexts on that GPU. As you have discovered, it is possible to create several contexts from one process, but it is usually not required.
And yes, when you have multiple contexts, kernels running in these contexts will require context switching to move from one kernel in one context to another kernel in another context. These cores cannot work simultaneously.
Using the CUDA API at runtime manages contexts for you. Usually you do not interact directly with the CUDA context when using the runtime API. However, when using the driver API, the context is explicitly created and managed.
source share