Well, I think I figured it out myself.
If cuobjdump
is on the way, then in cuda-gdb
command x $pc
will give you an assembler in which the current thread is stopped. The problem is that if the source was not compiled with -G
, you cannot associate the assembler statement with the line in your code.
To map assembler to kernel code, first make sure you compile the kernel using nvcc -keep [..] mykernel.cu
. This should generate the files mykernel.sm_20.cubin
(or any other take that you have chosen) and mykernel.ptx
.
To get the whole kernel collector, run cuobjdump -sass mykernel.cubin > output.ptx
. In cuda-gdb
do x/20i $pc-80
to get some context, and find these lines in the output.ptx
file. You can then try to match these lines with the PTX code in mykernel.ptx
, which contains the .loc
instructions that reference the line in the source.
This approach requires a bit of creativity when comparing PTX from a cubic file and PTX to nvcc
, since instructions can be nvcc
. In my code, I had large blocks of FFMA
instructions that I could find to understand. You can use "output.ptx" to find the exact line from the debugger, and then look at "mykernel.ptx" in the same relative position.
All of this involves quite a bit of work, but it allows you to narrow down the location of the "virtual PC" in the original source.
Pedro source share