CUDA email address - how to determine location in code?

cuda-memcheck reports this information for the CUDA kernel in release mode:

========= Error: process didn't terminate successfully ========= Invalid __global__ read of size 4 ========= at 0x000002c8 in xx_kernel ========= by thread (0,0,0) in block (0,0) ========= Address 0x10101600014 is out of bounds ========= ========= ERROR SUMMARY: 1 error 

This error occurs only in release mode. This also does not happen when working under cuda-gdb.

How can I take the address 0x000002c8 and determine the code that causes the error? I looked at the cached intermediate files (.ptx, .cubin, etc.) and I see no obvious way to identify the failed source code.

This is on x86_64 Linux with CUDA 3.2.

UPDATE: Turns out it was a compiler bug in 3.2. Upgrading to 4.0 results in a memcheck error. Also, I was able to parse CUBIN with cuobjdump from 4.0, but since it was a release mode and optimized, it was very difficult to match the disassembly with the source code.

+4
source share
2 answers

Download CUDA Toolkit 4.0 from the NVIDIA Development Zone . Use the new cuobjdump , which supports 2.x cubes.

cuobjdump -sass /path/to/your/cubin > /path/to/dump.txt .

Output example (tested on sm_20 die, code version 2.3)

  ... /*6018*/ /*0xe00100075003ff9a*/ CAL 0x46d8; /*6020*/ /*0x10001de428000000*/ MOV R0, R4; /*6028*/ /*0x00001de428000000*/ MOV R0, R0; /*6030*/ /*0x40011de428000000*/ MOV R4, R16; ... 
+7
source

Such errors in the kernel are tied to memory access, which is based not only on the thread identifier.

Given that each used memory area was correctly allocated for the GPU , access based on just something like threadIdx.x shouldn 'cause no problems. In this way:

  • or you have an incorrect index calculation (often this happens with expressions like data[blockDim.y * blockDim.x * threadIdx.z + blockDim.x * threadIdx.y + threadIdx.x] )
  • or you use another variable in indexing that exceeds the bounds of your array (for example, data[threadIdx.x + offset] )

---- Edit (following comments) ----
See @Cicada's answer for cuobjdump add-on for device> 2.x

+4
source

All Articles