Why do we need cudaDeviceSynchronize (); in kernels with printf?

__global__ void helloCUDA(float f) { printf("Hello thread %d, f=%f\n", threadIdx.x, f); } int main() { helloCUDA<<<1, 5>>>(1.2345f); cudaDeviceSynchronize(); return 0; } 

Why cudaDeviceSynchronize (); in many places, for example , is not needed here after calling the kernel?

+7
c gpu cuda nvidia
source share
1 answer

The kernel runs asynchronously. This means that it returns control to the CPU thread immediately after starting the GPU process, before the kernel completes execution.

So what's new in this case? Application exit.

When the application exits, the ability to send output to standard output ends with the OS.

Thus, the result that is later generated by the kernel has nowhere to go, and you will not see it.

On the other hand, if you use cudaDeviceSynchronize() , then the kernel will be terminated (and the output from the kernel will find a pending standard output queue) before the application can exit.

+14
source share

All Articles