Data transfer between GPUs in OpenCL

It takes a lot of time to take a lot of time between different GPU devices, because this process probably works as follows: GPU1-> CPU-> GPU2.So is there a better way to transfer data between GPUs? In addition, suppose there are N threads, each of them must read M elements from global memory. What conditions must be met if I want to perform shared access?

+4
source share
3 answers

There is a clEnqueueMigrateMemObjects function, which is new compared to OpenCL 1.2.

This function can be used to transfer memory buffers between devices in the same context.

I have never tried using it myself, so I don’t know if it really becomes less expensive than copying device-> host-> (most likely, this will happen in a number of implementations).

+4
source

Create buffers using the concept of pinned buffers for a date in the CPU, then get access to how many GPUs you would like to use ...

+2
source

As for the requirements for sharing memory, it's hard for you to answer them without seeing your code.

But the idea is that you get performance if threads load data that is contiguous in memory. One general idea to accomplish this is to use arrays instead of structures.

0
source

All Articles