Debugging CUDA or how to get the source code in cuda-gdb without disabling optimization?

Question

Debugging CUDA or how to get the source code in cuda-gdb without disabling optimization?

I have a rather large and complex CUDA code that hangs quite reliably for a large number of blocks / threads. I am trying to figure out exactly where the code is.

When I run the code in cuda-gdb , I see which threads / blocks are hanging, but I can’t see where, besides the “virtual PC”.

If I compile the code with "-G" to get debugging information, it works much slower and refuses to hang, no matter how long I run it.

Is there a way to map a “virtual PC” to a line of code in the source code, even roughly? Or is there a way to get debugging information without disabling all optimization?

I tried using "-G3" but to no avail. It just gives me warnings like " nvcc warning : Setting optimization level to 0 as optimized debugging is not supported ". I am using CUDA 4.1 compilation tools.

+4

debugging cuda cuda-gdb nvidia

Pedro May 14, '12 at 17:49

source share

1 answer

Pedro · Answer 1 · 2012-05-15T17:15:48+0000

Well, I think I figured it out myself.

If cuobjdump is on the way, then in cuda-gdb command x $pc will give you an assembler in which the current thread is stopped. The problem is that if the source was not compiled with -G , you cannot associate the assembler statement with the line in your code.

To map assembler to kernel code, first make sure you compile the kernel using nvcc -keep [..] mykernel.cu . This should generate the files mykernel.sm_20.cubin (or any other take that you have chosen) and mykernel.ptx .

To get the whole kernel collector, run cuobjdump -sass mykernel.cubin > output.ptx . In cuda-gdb do x/20i $pc-80 to get some context, and find these lines in the output.ptx file. You can then try to match these lines with the PTX code in mykernel.ptx , which contains the .loc instructions that reference the line in the source.

This approach requires a bit of creativity when comparing PTX from a cubic file and PTX to nvcc , since instructions can be nvcc . In my code, I had large blocks of FFMA instructions that I could find to understand. You can use "output.ptx" to find the exact line from the debugger, and then look at "mykernel.ptx" in the same relative position.

All of this involves quite a bit of work, but it allows you to narrow down the location of the "virtual PC" in the original source.

Debugging CUDA or how to get the source code in cuda-gdb without disabling optimization?

More articles: