Well, I think I figured it out myself.
If cuobjdump is on the way, then in cuda-gdb command x $pc will give you an assembler in which the current thread is stopped. The problem is that if the source was not compiled with -G , you cannot associate the assembler statement with the line in your code.
To map assembler to kernel code, first make sure you compile the kernel using nvcc -keep [..] mykernel.cu . This should generate the files mykernel.sm_20.cubin (or any other take that you have chosen) and mykernel.ptx .
To get the whole kernel collector, run cuobjdump -sass mykernel.cubin > output.ptx . In cuda-gdb do x/20i $pc-80 to get some context, and find these lines in the output.ptx file. You can then try to match these lines with the PTX code in mykernel.ptx , which contains the .loc instructions that reference the line in the source.
This approach requires a bit of creativity when comparing PTX from a cubic file and PTX to nvcc , since instructions can be nvcc . In my code, I had large blocks of FFMA instructions that I could find to understand. You can use "output.ptx" to find the exact line from the debugger, and then look at "mykernel.ptx" in the same relative position.
All of this involves quite a bit of work, but it allows you to narrow down the location of the "virtual PC" in the original source.
Pedro source share