I currently have a parallel loop like this:
int testValues[16]={5,2,2,10,4,4,2,100,5,2,4,3,29,4,1,52};
parallel_for (1, 100, 1, [&](int i){
int var4;
int values[16]={-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
/* ...nested for loops */
for (var4=0; var4<16; var4++) {
if (values[var4] != testValues[var4]) break;
}
/* ...end nested loops */
}
I have optimized as much as I can, to such an extent that the only thing I can do is add more resources.
I am interested in using the GPU for parallel task processing. I read that embarrassing parallel tasks like this can efficiently use a modern GPU.
Using any language, what is the easiest way to use a GPU for a simple parallel for a loop like this?
I don't know anything about GPU architectures or GPU native code.