Concurrency, 4 CUDA applications competing for GPU resources

Question

Concurrency, 4 CUDA applications competing for GPU resources

What happens if there are four parallel CUDA applications competing for resources in the same GPU so they can offload work to the graphics card ?. The Cuda 3.1 Programming Guide mentions that there are specific methods that are asynchronous:

The kernel starts
Copy device device memory
Copies of the memory of the host device memory block with a size of not more than 64 KB
Copies of memory executed by functions that have the Async suffix
Memory Recall Features

He also mentions that devices with computing power of 2.0 can run multiple cores at the same time, as long as the cores belong to the same context.

Is this type of concurrency used for threads in only one cuda application, but this is not possible if there are various applications requesting GPU resources?

Does this mean that simultaneous support is available in only one application (context ???) and that 4 applications will be launched at the same time as methods can overlap by switching context to the CPU, but 4 applications need to wait until the GPU is released by other applications ? (for example, starting the kernel from application 4 expects the completion of the kernel launch from application 1).

If so, how can these 4 applications access GPU resources without enduring a long wait time?

+6

c parallel-processing gpgpu cuda nvidia

Bartzilla Sep 14 '10 at 13:55

source share

1 answer

Tom · Accepted Answer · 2010-09-14T14:54:04+0000

As you said, only one “context” can occupy each of the engines at any given time. This means that one of the copying mechanisms can serve memcpy for application A, the other - memcpy for application B, and the computing engine can execute the kernel for application C (for example).

An application can have multiple contexts, but neither of the two applications can use the same context (although threads within the application can share the context).

Any application that plans work that runs on the GPU (e.g. memcpy or kernel launch) can schedule work asynchronously, so that the application is free and does some other CPU work, and it can schedule any number of tasks to run on the GPU.

Please note that you can also put GPUs in exclusive mode , with only one context running on the GPU at any time (i.e. all resources are reserved for the context until the context is destroyed). The default is general mode .

Concurrency, 4 CUDA applications competing for GPU resources

More articles: