Below are links to paper, pptx, and the source for various non-blocking data blocks, including a skip list and a priority queue. However, the source code is CUDA. CUDA code is close enough to OpenCL, and you can understand the essence of its implementation in OpenCL.
The priority queue is synchronized using atomic operations. Queue nodes are distributed on the host and transferred as a global array of nodes to the functions. The new node is obtained using the atomic increment of the array counter.
Nodes are inserted into the queue using atomic comparisons and call exchanges. Paper and ppx explain the work and problems of concurrency.
http://www.cse.iitk.ac.in/users/mainakc/projects.html
See the entry on the page above.
Concurrent Programming / Runtime Support [ICPADS 2012] [PDF] [Source] [Voice Slides (PPTX)] Prabhakar Misra and Manak Chaudhuri. Performance evaluation of parallel, loose data structures on GPUs. Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pp. 53-60, December 2012
Link to the source code http://www.cse.iitk.ac.in/users/mainakc/lockfree.html
source share