I know that a lot of time has passed when this question was asked. When I was looking for the same problem, I got to this page. Looks like I got a solution.
Decision:
I somehow reached here and saw a cool approach to launching a kernel from another core.
__global__ void kernel_child(float *var1, int N){
Dynamic parallelism on cuda 5.0 and above made this possible. Also, during operation, make sure that you are using compute_35 or higher architecture.
The terminal path You can run the above parent kernel (which ultimately launches the child kernel) from the terminal. Tested on a Linux machine.
$ nvcc -arch=sm_35 -rdc=true yourFile.cu $ ./a.out
Hope this helps. Thanks!
source share