Multiple CUDA contexts for a single device - does it make any sense?

I thought I understood this, but apparently I do not do this :) I need to parallel-encode the H.264 stream using NVENC from frames that do not fit into any of the formats accepted by the encoder, so I have following piping code:

  • A callback informing you that a new frame has arrived is called
  • I copy the frame to the CUDA memory and perform the necessary color space conversions (only the first cuMemcpy is synchronous, so I can return from the callback, all pending operations are inserted into the dedicated stream)
  • I click on an event on a thread and wait for it in the next thread as soon as it is installed. I take a CUDA memory pointer with a frame in the correct color space and pass it to the decoder.

For some reason, I had the assumption that I need a dedicated context for each thread if I execute this pipeline in parallel threads. The code was slow, and after some reading, I realized that context switching is really expensive, and then I really came to the conclusion that it does not make sense, since the whole GPU belongs in the context, so I block any parallel processing from other transcoder streams.

Question 1: In this scenario, I am well versed in using one context and an explicit thread created in this context for each thread that runs the specified pipeline?

Question 2: Can someone enlighten me on the sole purpose of the CUDA device context? I suppose this makes sense in a multi-GPU scenario, but are there any cases where I would like to create multiple contexts for a single GPU?

+5
source share
1 answer

Question 1: In this scenario, I am well versed in using one context and an explicit thread created in this context for each thread that runs the specified pipeline?

You should be fine with one context.

Question 2: Can someone enlighten me, which is the sole purpose of the CUDA device context? I suppose this makes sense in a multi-GPU scenario, but are there any cases where I would like to create multiple contexts for a single GPU?

The CUDA device context is discussed in the programming guide . It represents all the state (memory card, allocations, kernel definitions and other state-related information) associated with a particular process (i.e., associated with the use of this particular GPU process). Separate processes usually have separate contexts (like separate devices), since these processes have independent use of the GPU and independent memory cards.

If you are using GPU multiprocessing, you usually create several contexts on that GPU. As you have discovered, it is possible to create several contexts from one process, but it is usually not required.

And yes, when you have multiple contexts, kernels running in these contexts will require context switching to move from one kernel in one context to another kernel in another context. These cores cannot work simultaneously.

Using the CUDA API at runtime manages contexts for you. Usually you do not interact directly with the CUDA context when using the runtime API. However, when using the driver API, the context is explicitly created and managed.

+9
source

All Articles