Calling the kernel from the kernel

Continuation of Q from: CUDA: calling the __device__ function from the kernel

I am trying to speed up the sort operation. The following is a simplified pseudo version:

// some costly swap operation __device__ swap(float* ptrA, float* ptrB){ float saveData; // swap some saveData= *Adata; // big complex *Adata= *Bdata // data chunk *Bdata= saveData; } // a rather simple sort operation __global__ sort(float data[]){ for (i=0; i<limit: i++){ find left swap point find right swap point swap<<<1,1>>>(left, right); } } 

(Note. This simple version does not show reduction methods in blocks.) The idea is that it is easy (quick) to identify swap points. Swap operation is expensive (slow). Therefore, use one block to find / identify swap points. Use other blocks to perform swap operations. that is, the actual swap in parallel. It seems like a decent plan. But if the compiler connects to the device, then a parallel swap occurs. Is there any way to tell the compiler NOT to connect the device call?

+4
source share
2 answers

Edit (2016):

Dynamic parallelism was introduced in the second generation of Kepler GPUs. The launch of cores in the device is supported with the ability to calculate 3.5 and higher.


Original answer:

You will need to wait until the end of the year for the next generation of hardware to be available. No current CUDA devices can run kernels from other kernels - they are not currently supported.

+4
source

I know that a lot of time has passed when this question was asked. When I was looking for the same problem, I got to this page. Looks like I got a solution.

Decision:

I somehow reached here and saw a cool approach to launching a kernel from another core.

 __global__ void kernel_child(float *var1, int N){ //do data operations here } __global__ void kernel_parent(float *var1, int N) { kernel_child<<<1,2>>>(var1,N); } 

Dynamic parallelism on cuda 5.0 and above made this possible. Also, during operation, make sure that you are using compute_35 or higher architecture.

The terminal path You can run the above parent kernel (which ultimately launches the child kernel) from the terminal. Tested on a Linux machine.

 $ nvcc -arch=sm_35 -rdc=true yourFile.cu $ ./a.out 

Hope this helps. Thanks!

+1
source

All Articles