Copying an integer from the GPU to the CPU

I need to copy one logical or integer value from the device to the host after each kernel call (I call the same kernel in a for loop). That is, after each kernel call, I need to send an integer or logical value back to the host. What is the best way to do this?

Do I have to write the value directly to RAM? Or should I use cudaMemcpy ()? Or is there any other way to do this? Would it only copy 1 integer after each kernel run slows down my program?

+5
source share
4 answers

Let me answer your last question first:

Would copy only 1 integer after every kernel run slows my program down?

- . , GPU .. .. (1 int vs 100 ints), , . , . , ​​ , ( )

?

, . : ​​ cudaMemcpy. , , , - . .

, cudaThreadsynchronize(), , ​​ . .

cudaMemcpyAsync, , GPU ​​ cudaMemcpyAsync, , .

, , , , . , . - , , CUDA .

+4

? , . " CUDA C" .

+1

GPU , CPU. , , .

, , , , , CUDA. , GPU, , CPU; CPU.

0

If you need the value calculated in the previous kernel call to start the next one, then it will be serialized and your choice will be cudaMemcpy (dst, src, size = 1, ...);

If all parameters of the kernel launch are independent of the previous launch, you can save all the results of each kernel call in the GPU memory, and then immediately load all the results.

0
source

All Articles