As the OpenCL implementation is mature, you can achieve good levels of performance for your cores in a wide range of devices. Some recent work in my research group shows that in some cases, OpenCL codes achieve a similar fraction of the hardware peak performance on the processor and GPU. In the processor, the OpenCL cores were very efficiently automated using the Intel OpenCL CPU implementation. Effective code was created on the GPU for HPC and desktop devices from Nvidia (who OpenCL still works surprisingly well) and AMD.
If you want to develop your OpenCL code anyway to use the GPU, you often get the fast multi-core + SIMD version for free by running the same code on the CPU.
In the last two documents of my group, which describe in detail the results of the performance portability that we have achieved in four different real applications with OpenCL, see
"On the portability of the performance of structured mesh codes on multi-core computer architectures," S.N. McIntosh-Smith, M. Boulton, D. Curran, and JR Price. ISC, Leipzig, June 2014. DOI: 10.1007 / 978-3-319-07518-1_4
“High Performance in Silicone Virtual Drug Screening on Multicore Processors,” S. Macintosh Smith, J. Price, RB Sessions, AA Ibarra, IJHPCA 2014. DOI: 10.1177 / 1094342014528252
simonmcs
source share