I am confused by some comments that I saw about locking and cudaMemcpy. I understand that Fermi HW can simultaneously execute kernels and execute cudaMemcpy.
I read that the lib func function cudaMemcpy () is a lock function. Does this mean that func will block further execution until the copy is fully completed? OR Does this mean that the copy will not start until the previous kernels have finished?
eg. Does this code provide the same locking operation?
SomeCudaCall<<<25,34>>>(someData); cudaThreadSynchronize();
vs
SomeCudaCall<<<25,34>>>(someParam); cudaMemcpy(toHere, fromHere, sizeof(int), cudaMemcpyHostToDevice);
source share