Working with large switch statements in CUDA

Question

Working with large switch statements in CUDA

I understand that branching in CUDA is not recommended, as this can adversely affect performance. In my work, I found that I have to execute large switch statements that contain more than a few dozen cases.

Does anyone know how this will greatly affect performance. (The official documentation is not very specific). Does anyone have a more efficient way to handle this part?

+4

parallel-processing switch-statement cuda statements

gamerx Jun 25 '12 at 8:21

source share

2 answers

The GPU launches threads in groups of 32, called skews. Whenever different threads in warp go through different paths in the code, the GPU must run the whole warp several times, once for each code path.

To deal with this problem, the so-called warp deviation, you want to streamline your threads so that the threads in a given warp go through as few different code paths as possible. When you do this, pretty much you just have to bite the bullet and accept the performance loss caused by any remaining deformation. In some cases, perhaps not all you can do to streamline your threads. If so, and if different code paths are a large part of your kernel or overall workload, the task may not be suitable for the GPU.

It doesn't matter how you implement the different code paths. if-else , switch , prediction (in PTX or SASS), branch tables or something else - if it comes to threads in warp running on different paths, you get a performance hit.

It also doesn't matter how many threads go through each path, just the total number of different paths in the warp.

Below is another answer , which will be more detailed.

+4

Roger dahl Jun 26 '12 at 5:20

source share

geek · Accepted Answer · 2012-06-25T09:33:23+0000

A good way to avoid multiple switches is to implement a function table and select a function from the table by index based on your switching state. CUDA allows the use of pointers to __device__ functions in kernels.

Working with large switch statements in CUDA

More articles: