Compilation of code containing dynamic parallelism fails

I am involved in the dynamic programming of parallelism using CUDA 5.5 and the NVIDIA GeForce GTX 780, whose computing power is 3.5. I call the kernel function inside the kernel function, but it gives me an error:

error: the call to the __global__ ("kernel_6") function from the __global__ ("kernel_5") function is allowed only in the compute_35 architecture or higher

What am I doing wrong?

+7
c ++ parallel-processing cuda dynamic-execution
source share
3 answers

You can do something like this

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt 

or

If you have 2 files simple1.cu and test.c, you can do something as shown below. This is called separate compilation.

 nvcc -arch=sm_35 -dc simple1.cu nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt g++ -c test.c g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart 

The same is explained in cuda programming guide

+11
source share

From Visual Studio 2010:

 1) View -> Property Pages 2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true) 3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35 4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib 
+5
source share

You must enable nvcc to generate CC 3.5 code for your device. This can be done by adding this option to the nvcc command line.

  -gencode arch=compute_35,code=sm_35 

You can find CUDA patterns in dynamic parallelism for more details. They contain both command line parameters and project parameters for all supported operating systems.

http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-

+3
source share

All Articles