I heard good things about Parallel Programming Patterns , and I think combining it with the CUDA Best Practice Guide (included in the CUDA Toolkit) should be a good start.
Sites such as the Sean Baxter ModernGPU, as mentioned by marina.k, will help with the implementation of CUDA for some common parallel algorithm patterns (but I would use the Thrust or Sean code from ModernGPU and not implement my own if you are not doing this as a training exercise) .
source share