CUDA: Summarizing Results

I use CUDA to trigger a problem when I need a complex equation with many input matrices. Each matrix has an identifier depending on its set (from 1 to 30, there are 100,000 matrices), and the result of each matrix is ​​stored in the float [N] array, where N is the number of input matrices.

After that, the result that I want is the sum of each float in this array for each identifier, so with 30 identifiers there are 30 resulting floats.

Any suggestions on how I should do this?

Now I read the float array (400kb) back to the host from the device and run it on the host:

// Allocate result_array for 100,000 floats on the device // CUDA process input matrices // Read from the device back to the host into result_array float result[10] = { 0 }; for (int i = 0; i < N; i++) { result[input[i].ID] += result_array[i]; } 

But I wonder if there is a better way.

+4
source share
1 answer

You can use cublasSasum() to do this - it's a little easier than adapting one of the SDK abbreviations (but, of course, less general). Check out the CUBLAS examples in the CUDA SDK.

+3
source

All Articles