I am going to start converting the program that I wrote in CUDA in order to hopefully increase the processing speed.
Now, obviously, my old program performs many functions one by one, and I separated these functions from my main program and called everyone in order.
void main () { *initialization of variables* function1() function2() function3() print result; }
These functions are inherently serial because funtion2 depends on the results of funtion1.
Ok, now I want to convert these functions to kernels and run tasks in parallel functions.
Is it as simple as rewriting each function in parallel, and then in my main program call each core one by one? Is it slower than necessary? For example, can I have my GPU perform the following parallel operation without returning to the CPU to initialize the next core?
Obviously, I will keep all the run-time variables in the GPU memory to limit the amount of data transfer, so should I even worry about the time it takes between kernel calls?
I hope this question is clear, if you do not ask me, specify. Thanks.
And here is an additional question so that I can check my judiciousness. Ultimately, this program input is a video file, and through various functions, each frame will produce a result. My plan is to capture several frames at a time (for example, 8 unique frames), and then divide the total number of blocks that I have among these 8 frames, and then several threads in the blocks will perform even more parallel data operations images such as vector addition, Fourier transform, etc.
Is this the right way to approach the problem?
Shawn tabrizi
source share