What is the difference between clEnqueueBarrier and clFinish?

The OpenCL 1.1 specification says:

cl_int clEnqueueBarrier (cl_command_queue command_queue)

clEnqueueBarrier is a synchronization point that ensures that all the commands in the queue in the command_queue command complete execution before the next command can start execution.

cl_int clFinish (cl_command_queue command_queue)

Blocks until all previously queued OpenCL commands in the command_queue command are sent to the appropriate device and completed. clFinish is not returned until all the commands in the queue in the command_queue command have been processed and completed. clFinish is also a synchronization point.

I have to do something with the execution of the order or out of order, but I do not see the difference. Are they always needed if I have a performance in order? At the moment I am doing something like:

... for(...){ clEnqueuNDRangeKernel(...); clFlush(command_queue); clFinish(command_queue); } ... 

on the Nvidia GPU. Any relevant comments are welcome.

+6
source share
1 answer

You need to queue a barrier if you write a queue out of turn as one way to ensure dependency. You can also use cl_event objects to ensure proper ordering of commands on the command line.

If you write your code in such a way that you call clFinish after each kernel call, then using clEnqueueBarrier will not affect your code, since you already provide ordering.

The point to use clEnqueueBarrier would be a case like:

 clEnqueueNDRangeKernel(queue, kernel1); clEnqueueBarrier(queue); clEnqueueNDRangeKernel(queue, kernel2); 

In this case, kernel2 depends on the results of kernel1. If this queue is out of order, then without a barrier2 kernel, it can execute up to kernel1, causing incorrect behavior. You can achieve the same order:

 clEnqueueNDRangeKernel(queue, kernel1); clFinish(queue); clEnqueueNDRangeKernel(queue, kernel2); 

because clFinish will wait until the queue is empty (all kernel / data transfers completed). However, clFinish will wait for kernel1 to finish in this case, while clEnqueueBarrier should immediately return control back to the application (allowing you to insert more kernels or do other useful work.

As a side note, I think clFinish will implicitly call clFlush , so you won’t need to call it every time.

+5
source

All Articles