CUDA and nvcc: using a preprocessor to choose between float or double

Problem :

Having .h, I want to define real to be double if compiled for c / C ++ or for cuda with computational ability> = 1.3. If compilation for cuda with computational ability <1.3, then define real for float.

A few hours later I came to this (which does not work)

  # if defined (__ CUDACC__)

 # warning * making definitions for cuda

 # if defined (__ CUDA_ARCH__)
 # warning __CUDA_ARCH__ is defined
 # else
 # warning __CUDA_ARCH__ is NOT defined
 # endif

 # if (__CUDA_ARCH__> = 130)
 # define real double
 # warning using double in cuda
 # elif (__CUDA_ARCH__> = 0)
 # define real float
 # warning using float in cuda
 # warning how the hell is this printed when __CUDA_ARCH__ is not defined?
 # else
 # define real 
 # error what the hell is the value of __CUDA_ARCH__ and how can I print it
 # endif

 # else
 # warning * making definitions for c / c ++
 # define real double
 # warning using double for c / c ++
 # endif

at compilation (check the -arch flag)

  nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu 

I get

  * making definitions for cuda
 __CUDA_ARCH__ is defined
 using double in cuda

 * making definitions for cuda
 warning __CUDA_ARCH__ is NOT defined
 warning using float in cuda
 how the hell is this printed if __CUDA_ARCH__ is not defined now?

 Undefined symbols for architecture i386:
   "myKernel (float *, int)", referenced from: ....

I know that files are compiled twice by nvcc. The first one is OK ( CUDACC and CUDA_ARCH > = 130), but what happens the second time? CUDA_DEFINED but CUDA_ARCH undefined or with value <130? Why?

Thank you for your time.

+4
preprocessor cuda nvcc
source share
2 answers

It seems you can combine two things: how to distinguish between compilation paths of a host and a device, when nvcc processes CUDA code and how to distinguish between CUDA and non-CUDA code. There is a subtle difference between them. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

Consider the following code snippet:

 #ifdef __CUDACC__ #warning using nvcc template <typename T> __global__ void add(T *x, T *y, T *z) { int idx = threadIdx.x + blockDim.x * blockIdx.x; z[idx] = x[idx] + y[idx]; } #ifdef __CUDA_ARCH__ #warning device code trajectory #if __CUDA_ARCH__ > 120 #warning compiling with double precision template void add<double>(double *, double *, double *); #else #warning compiling with single precision template void add<float>(float *, float *, float *); #else #warning nvcc host code trajectory #endif #else #warning non-nvcc code trajectory #endif 

Here we have a CUDA boilerplate engine with a CUDA-dependent architecture, a separate stanza for host code, restrained by nvcc , and a stanza for compiling host code that is not controlled by nvcc . This behaves as follows:

 $ ln -s cudaarch.cu cudaarch.cc $ gcc -c cudaarch.cc -o cudaarch.o cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory $ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:19:2: warning: #warning compiling with single precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11' ptxas info : Used 4 registers, 12+16 bytes smem $ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:16:2: warning: #warning compiling with double precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20' ptxas info : Used 8 registers, 44 bytes cmem[0] 

Select points from this:

  • __CUDACC__ determines whether nvcc a steering compilation or not
  • __CUDA_ARCH__ always undefined when compiling host code, managed by nvcc or not
  • __CUDA_ARCH__ defined only for the nvcc - nvcc compilation code nvcc

These three pieces of information are always enough to have conditional compilation for device code for different CUDA architectures, host-side CUDA code, and code that is not compiled by nvcc at all. The nvcc documentation nvcc sometimes a bit short, but all of this is covered in the discussion of compilation paths.

+23
source share

At the moment, the only practical solution that I see is to use a custom definition:

 # if (! defined (__ CUDACC__) || defined (USE_DOUBLE_IN_CUDA)) 
 # define real double
 # warning defining double for cuda or c / c ++
 # else
 # define real float
 # warning defining float for cuda
 # endif

and then

  nvcc -DUSE_DOUBLE_IN_CUDA -arch compute_13 -Ilibcutil testFloatDouble.cu

As it outputs for two compilations:

  #warning defining double for cuda or c / c ++
 #warning defining double for cuda or c / c ++

and

  nvcc -Ilibcutil testFloatDouble.cu 

does

  #warning defining float for cuda
 #warning defining float for cuda
+3
source share

All Articles