For example, I have an array of (x, y) points, and I want to organize them in kd-tree
Building a kd tree involves sorting and computing bounding rectangles. These algorithms work fine on CUDA, but is there a way to build a kd tree using as many threads as possible?
I think there should be some tricks:
Typically, a kd tree is implemented with recursion, but as far as I know, CUDA processors do not have a hardware stack, so recursion should be avoided.
How can I efficiently build a kd tree in Cuda?
source
share