Generalized transformation of Hough in CUDA. How to speed up the binning process?

As in the title, I am working on a small personal study of parallel computer vision methods. Using CUDA, I am trying to implement the GPGPU version of the Hough conversion. The only problem I encountered was during the voting process. I call atomicAdd () to prevent multiple write operations at the same time, and it seems I am not improving performance too much. I searched the Internet, but did not find any way to significantly increase the efficiency of the voting process.

Any help you could provide regarding the voting process would be greatly appreciated.

+4
source share
2 answers

I am not familiar with the Hough transform, so posting some pseudo code can help here. But if you are interested in voting, you may consider using CUDA's internal voting instructions to expedite this.

Note. This requires 2.0 or later computing power (Fermi or later).

If you want to count the number of threads in a block for which a specific condition is specified, you can simply use __syncthreads_count() .

 bool condition = ...; // compute the condition int blockCount = __syncthreads_count(condition); // must be in non-divergent code 

If you want to count the number of threads in the grid for which the condition is true, you can do atomicAdd

 bool condition = ...; // compute the condition int blockCount = __syncthreads_count(condition); // must be in non-divergent code atomicAdd(totalCount, blockCount); 

If you need to count the number of threads in a group smaller than the block for which the condition is true, you can use __ballot() and __popc() (population count).

 // get the count of threads within each warp for which the condition is true bool condition = ...; // compute the condition in each thread int warpCount = __popc(__ballot()); // see the CUDA programming guide for details 

Hope this helps.

+1
source

In a very short time I used voting processes ...

at the very end, atomicAdd gets even faster in both scenarios

this link is very useful: warp filtering

This problem was solved by me. Recording data only from selected bands in Warp using Shuffle + ballot + popc

not looking for a critical section?

0
source

All Articles