Effective 2D array reduction in CUDA?

The CUDA SDK has sample code and presentation slides for effective one-dimensional reduction. I also saw several works on the introduction of one-dimensional abbreviations and prefix scans in CUDA.

Is there an effective CUDA code to reduce a dense two-dimensional array ? Pointers to the code or related documents will be appreciated.

+4
source share
2 answers

I don’t know what exactly is the problem that you are trying to solve, but in fact you could just think of the 2D array as a long 1D array and use the SDK code to reduce the work. Simple arrays in CUDA are just 1D blocks of memory with special addressing rules - why don't you take this opportunity.

+4
source

matrix reduction can be somewhat easier to implement, since vector / row reduction of a vector can be performed independently. You can allow each thread to process a column / row (depending on the main orientation of the matrix) and coalesce the reading in memory. I doubt that you can buy more performance without resorting to a texture / permanent cache where terrain can become important.

+1
source

All Articles