I am working on translating a CUDA application ( this, if you should know ) into OpenCL. The source application uses a C-style CUDA API with a single thread to avoid automatically waiting when reading results.
Now I notice that OpenCL command queues are very similar to CUDA threads. But in the device read command , as well as in the kernel write and start commands, I also see the parameters for the events. Therefore, I wonder what it takes to record a device, the number of cores (for example, one call to one core, and then 100 calls to another core), and the device reads everything sequentially?
- If I just put them sequentially in one queue, will they be executed sequentially, as in CUDA?
- If this does not work, can / should I generate a chain of events so that each call waiting list displays the previous call event?
- Or should I add all the previous events to the waiting list, for example, if there is a search for N ^ 2 dependencies or something else?
- Or do I just need event.wait () for each call individually, for example, does it say in the AMD tutorial ?
Thank!
source
share