In previous versions of CUDA, atomicAdd was not implemented for doubles, so this is usually implemented as here . With the new CUDA 8 RC, I run into problems when I try to compile my code that includes such a function. I suppose this is due to the fact that Pascal and Compute Capability 6.0 have added their own dual version of atomicAdd, but for some reason this is incorrectly ignored for previous computing capabilities.
The code below is used to compile and work with previous versions of CUDA, but now I get this compilation error:
test.cu(3): error: function "atomicAdd(double *, double)" has already been defined
But if I remove my implementation, I get this error:
test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (double *, double)
I should add that I only see this if I am compiling with -arch=sm_35 or similar. If I compile with -arch=sm_60 , I get the expected behavior, that is, only the first error and successful compilation in the second case.
Edit: Also, this is typical of atomicAdd - if I change the name, it works well.
It really looks like a compiler error. Can anyone else confirm that this is so?
Code example:
__device__ double atomicAdd(double* address, double val) { unsigned long long int* address_as_ull = (unsigned long long int*)address; unsigned long long int old = *address_as_ull, assumed; do { assumed = old; old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val + __longlong_as_double(assumed))); } while (assumed != old); return __longlong_as_double(old); } __global__ void kernel(double *a) { double b=1.3; atomicAdd(a,b); } int main(int argc, char **argv) { double *a; cudaMalloc(&a,sizeof(double)); kernel<<<1,1>>>(a); cudaFree(a); return 0; }
Edit: I received a response from Nvidia that recognized this problem, and here is what the developers say about it:
The sm_60 architecture, which is recently supported in CUDA 8.0, has native fp64 atomicAdd. Due to the limitations of our toolchain and the CUDA language, a declaration of this function should be present even when the code is not specially compiled for sm_60. This causes a problem in your code as you also define fp64 atomicAdd.
CUDA built-in functions, such as atomicAdd, are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA built-in functions. We will suggest that you rename your atomicAdd function to one that is not like any built-in CUDA functions.