Little CUDA Puzzle

I have an array A[0...N] of double and an array B[0...N] of int . Each B[i] changes to [0...P] . All I need to do is compute the array C[0...P] :

 C[j] = SUM( A[i] : B[i] = j) 

I cannot use N threads with atomicAdd() function since it does not support double as far as I know. A direct implementation with flows P diverges greatly. Is there a better way?

+4
source share
1 answer

If I understand correctly, you are trying to make a summing reduction of a double-precision array A with the whole keys stored in B The Thrust template library contains a reduce_by_key operation for this. The sum rows example shows how to use reduce_by_key for a similar application, although it uses iterator counting to generate the key, and not to use the key provided by the user vector. It should be trivial to change it to suit your needs.

+5
source

All Articles