CUDA atomicAdd for double definition error

Question

CUDA atomicAdd for double definition error

In previous versions of CUDA, atomicAdd was not implemented for doubles, so this is usually implemented as here . With the new CUDA 8 RC, I run into problems when I try to compile my code that includes such a function. I suppose this is due to the fact that Pascal and Compute Capability 6.0 have added their own dual version of atomicAdd, but for some reason this is incorrectly ignored for previous computing capabilities.

The code below is used to compile and work with previous versions of CUDA, but now I get this compilation error:

test.cu(3): error: function "atomicAdd(double *, double)" has already been defined

But if I remove my implementation, I get this error:

 test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (double *, double)

I should add that I only see this if I am compiling with -arch=sm_35 or similar. If I compile with -arch=sm_60 , I get the expected behavior, that is, only the first error and successful compilation in the second case.

Edit: Also, this is typical of atomicAdd - if I change the name, it works well.

It really looks like a compiler error. Can anyone else confirm that this is so?

Code example:

 __device__ double atomicAdd(double* address, double val) { unsigned long long int* address_as_ull = (unsigned long long int*)address; unsigned long long int old = *address_as_ull, assumed; do { assumed = old; old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val + __longlong_as_double(assumed))); } while (assumed != old); return __longlong_as_double(old); } __global__ void kernel(double *a) { double b=1.3; atomicAdd(a,b); } int main(int argc, char **argv) { double *a; cudaMalloc(&a,sizeof(double)); kernel<<<1,1>>>(a); cudaFree(a); return 0; }

Edit: I received a response from Nvidia that recognized this problem, and here is what the developers say about it:

The sm_60 architecture, which is recently supported in CUDA 8.0, has native fp64 atomicAdd. Due to the limitations of our toolchain and the CUDA language, a declaration of this function should be present even when the code is not specially compiled for sm_60. This causes a problem in your code as you also define fp64 atomicAdd.
CUDA built-in functions, such as atomicAdd, are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA built-in functions. We will suggest that you rename your atomicAdd function to one that is not like any built-in CUDA functions.

+7

atomic cuda nvidia

kalj Jun 01 '16 at 11:10

source share

1 answer

Florent DUGUET · Accepted Answer · 2016-06-01T13:02:53+0000

This atomicAdd flavor is a new method introduced to enable 6.0 compute. You can save your previous implementation of other computing capabilities by protecting it with a macro definition

 #if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600 #else <... place here your own pre-pascal atomicAdd definition ...> #endif

This macro, called the architecture identification macro, is documented here :

5.7.4. Virtual Architecture Identification Macro
The architecture identification macro __CUDA_ARCH__ is assigned a three-digit string of xy0 values (ends in literal 0) at each nvcc 1 compilation stage that compiles for compute_xy.
This macro can be used to implement the functions of the GPU to determine the virtual architecture for which it is currently compiled. The host code (non-GPU code) should not depend on it.

I assume that NVIDIA did not set it for the previous CC, in order to avoid conflict for users defining it, and not going to Compute Capability> = 6.x. I would not consider this a mistake, but rather a release delivery practice.

EDIT : macro protection was incomplete (fixed) - here is a complete example.

 #if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600 #else __device__ double atomicAdd(double* a, double b) { return b; } #endif __device__ double s_global ; __global__ void kernel () { atomicAdd (&s_global, 1.0) ; } int main (int argc, char* argv[]) { kernel<<<1,1>>> () ; return ::cudaDeviceSynchronize () ; }

Compiling with:

 $> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Wed_May__4_21:01:56_CDT_2016 Cuda compilation tools, release 8.0, V8.0.26

Command lines (both successful):

 $> nvcc main.cu -arch=sm_60 $> nvcc main.cu -arch=sm_35

You can find why it works with the included file: sm_60_atomic_functions.h , where the method is not declared if __CUDA_ARCH__ less than 600.

CUDA atomicAdd for double definition error

More articles: