What happens if there are four parallel CUDA applications competing for resources in the same GPU so they can offload work to the graphics card ?. The Cuda 3.1 Programming Guide mentions that there are specific methods that are asynchronous:
- The kernel starts
- Copy device device memory
- Copies of the memory of the host device memory block with a size of not more than 64 KB
- Copies of memory executed by functions that have the Async suffix
- Memory Recall Features
He also mentions that devices with computing power of 2.0 can run multiple cores at the same time, as long as the cores belong to the same context.
Is this type of concurrency used for threads in only one cuda application, but this is not possible if there are various applications requesting GPU resources?
Does this mean that simultaneous support is available in only one application (context ???) and that 4 applications will be launched at the same time as methods can overlap by switching context to the CPU, but 4 applications need to wait until the GPU is released by other applications ? (for example, starting the kernel from application 4 expects the completion of the kernel launch from application 1).
If so, how can these 4 applications access GPU resources without enduring a long wait time?
c parallel-processing gpgpu cuda nvidia
Bartzilla
source share