I want to use __syncthreads () for recursion, for example
__device__ void foo(int k) { if (some_condition) { for (int i=0;i<8;i++) { foo(i+k);
How is this __syncthreads () now applied? I know that it applies only in a block. As far as I understand, this is done for all local threads regardless of recursion depth? But what if I want to make sure that this __syncthreads () is at a certain recursion depth? Is it possible? I could check the depth of the recursion, but I believe this won't work either.
Are there any possible alternatives?
I saw that there are 3 syncthread extensions for CUDA Device> = 2.0
int __syncthreads_count(int predicate); int __syncthreads_and(int predicate); int __syncthreads_or(int predicate);
But I donβt think they will help, because they seem like an atomic counter.
source share