It seems you can combine two things: how to distinguish between compilation paths of a host and a device, when nvcc processes CUDA code and how to distinguish between CUDA and non-CUDA code. There is a subtle difference between them. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.
Consider the following code snippet:
#ifdef __CUDACC__ #warning using nvcc template <typename T> __global__ void add(T *x, T *y, T *z) { int idx = threadIdx.x + blockDim.x * blockIdx.x; z[idx] = x[idx] + y[idx]; } #ifdef __CUDA_ARCH__ #warning device code trajectory #if __CUDA_ARCH__ > 120 #warning compiling with double precision template void add<double>(double *, double *, double *); #else #warning compiling with single precision template void add<float>(float *, float *, float *); #else #warning nvcc host code trajectory #endif #else #warning non-nvcc code trajectory #endif
Here we have a CUDA boilerplate engine with a CUDA-dependent architecture, a separate stanza for host code, restrained by nvcc , and a stanza for compiling host code that is not controlled by nvcc . This behaves as follows:
$ ln -s cudaarch.cu cudaarch.cc $ gcc -c cudaarch.cc -o cudaarch.o cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory $ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:19:2: warning: #warning compiling with single precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11' ptxas info : Used 4 registers, 12+16 bytes smem $ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:14:2: warning: #warning device code trajectory cudaarch.cu:16:2: warning: #warning compiling with double precision cudaarch.cu:3:2: warning: #warning using nvcc cudaarch.cu:23:2: warning: #warning nvcc host code trajectory ptxas info : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20' ptxas info : Used 8 registers, 44 bytes cmem[0]
Select points from this:
__CUDACC__ determines whether nvcc a steering compilation or not__CUDA_ARCH__ always undefined when compiling host code, managed by nvcc or not__CUDA_ARCH__ defined only for the nvcc - nvcc compilation code nvcc
These three pieces of information are always enough to have conditional compilation for device code for different CUDA architectures, host-side CUDA code, and code that is not compiled by nvcc at all. The nvcc documentation nvcc sometimes a bit short, but all of this is covered in the discussion of compilation paths.
talonmies
source share