CUDA __syncthreads () and recursion

I want to use __syncthreads () for recursion, for example

__device__ void foo(int k) { if (some_condition) { for (int i=0;i<8;i++) { foo(i+k); // foo might take longer with some inputs __syncthreads(); } } } 

How is this __syncthreads () now applied? I know that it applies only in a block. As far as I understand, this is done for all local threads regardless of recursion depth? But what if I want to make sure that this __syncthreads () is at a certain recursion depth? Is it possible? I could check the depth of the recursion, but I believe this won't work either.

Are there any possible alternatives?

I saw that there are 3 syncthread extensions for CUDA Device> = 2.0

 int __syncthreads_count(int predicate); int __syncthreads_and(int predicate); int __syncthreads_or(int predicate); 

But I don’t think they will help, because they seem like an atomic counter.

+4
source share
3 answers

As you know, __syncthreads() is safe when all threads inside the block reach the barrier. This means that if you call __syncthreads() from a condition, the condition must be evaluated equally for all threads within the block.

For __syncthreads() in recursion, this means that all threads within the block must recurs to the same depth, otherwise not all threads will reach the same barrier.

+7
source

Are there any possible alternatives?

Yes, do not use the recursion paradigm to express your functional logic

+2
source

Of course, what you said about __syncthreads () is true, it only works for local threads inside blocks, so you have no control over what happens in other blocks. the best way to reduce is to first make a reduction for the entire array, which will contain an array equal to the size of the blocks. Then do not copy the array back to the host, but call another reduction, which will have 1 block and threads similar to the number of blocks in the previous call, and then copy an array of size 1 from the device to the host. but remember to use cudaThreadSynchronize () between two coz calls, if the first shortcut is not created, you can make this shortcut. it's a two-step reduction, but it works for me.

Hooray!!! Saif

0
source

All Articles