cuda-memcheck reports this information for the CUDA kernel in release mode:
This error occurs only in release mode. This also does not happen when working under cuda-gdb.
How can I take the address 0x000002c8 and determine the code that causes the error? I looked at the cached intermediate files (.ptx, .cubin, etc.) and I see no obvious way to identify the failed source code.
This is on x86_64 Linux with CUDA 3.2.
UPDATE: Turns out it was a compiler bug in 3.2. Upgrading to 4.0 results in a memcheck error. Also, I was able to parse CUBIN with cuobjdump from 4.0, but since it was a release mode and optimized, it was very difficult to match the disassembly with the source code.
source share