What is the advantage of running OpenCL in aCPU?

I am learning OpenCL programming and notice something strange.

Namely, when I list all OpenCL-enabled devices on my computer (Macbook Pro), I get the following list:

  • Intel (R) Core (TM) i7-4850HQ CPU @ 2.30 GHz
  • Iris pro
  • GeForce GT 750M

The first is my processor, the second is an integrated graphics solution from Intel, and the third is my dedicated graphics card.

Research shows that Intel has made its OpenCL hardware compatible so that I can use the power of the integrated graphics unit. It will be Iris Pro.

With that in mind, what is the purpose of using an OpenCL compatible processor? This is just for convenience, so that the kernels can run on the CPU as a backup if no other cards are found or there are any speed advantages when running code in the form of OpenCL kernels instead of the usual (C, well-sliced) programs on top of the CPU ?

+7
performance c intel opencl processor
source share
3 answers

For basic information, see https://software.intel.com/sites/default/files/m/d/4/1/d/8/Writing_Optimal_OpenCL_28tm_29_Code_with_Intel_28R_29_OpenCL_SDK.pdf .

Basically, the Intel OpenCL compiler performs horizontal auto-clipping for certain types of cores. This means that with SSE4 you get 8 threads running in parallel in one core, just like the Nvidia GPU launches 32 threads in one 32-bit simd block.

With this approach, there are 2 main advantages: what will happen if after 2 years they increase the width of the SSE vector to 16? Then you will immediately get auto-update for 16 threads when launched on this CPU. No need to recompile the code. The second advantage is that it is much easier to write an OpenCL kernel that is automatically automated compared to writing in ASM or C and getting your compiler to create efficient code.

+9
source share

As the OpenCL implementation is mature, you can achieve good levels of performance for your cores in a wide range of devices. Some recent work in my research group shows that in some cases, OpenCL codes achieve a similar fraction of the hardware peak performance on the processor and GPU. In the processor, the OpenCL cores were very efficiently automated using the Intel OpenCL CPU implementation. Effective code was created on the GPU for HPC and desktop devices from Nvidia (who OpenCL still works surprisingly well) and AMD.

If you want to develop your OpenCL code anyway to use the GPU, you often get the fast multi-core + SIMD version for free by running the same code on the CPU.

In the last two documents of my group, which describe in detail the results of the performance portability that we have achieved in four different real applications with OpenCL, see

"On the portability of the performance of structured mesh codes on multi-core computer architectures," S.N. McIntosh-Smith, M. Boulton, D. Curran, and JR Price. ISC, Leipzig, June 2014. DOI: 10.1007 / 978-3-319-07518-1_4

“High Performance in Silicone Virtual Drug Screening on Multicore Processors,” S. Macintosh Smith, J. Price, RB Sessions, AA Ibarra, IJHPCA 2014. DOI: 10.1177 / 1094342014528252

+4
source share

I thought it over for a while. You can get most of the benefits of OpenCL for a processor without using OpenCL and without much difficulty in C ++. For this you need:

  • Something for multithreading - I use OpenMP for this
  • SIMD library. I use the Agner Fog Class Vector Library (VCL) for this, which covers the SSE2-AVX512.
  • Mathematical library SIMD. Once again I am using Anger Fog VCL for this.
  • CPU dispatcher. For this, Agner Fog VCL is used.

Using the CPU manager, you determine which hardware is available and choose the best hardware-based code path. This provides one of the advantages of OpenCL.

This gives you most of the advantages of OpenCL on a processor without all its disadvantages. You never have to worry about the vendor ceasing to support the driver. Nvidia has only minimal OpenCL support - including a few summer bugs that most likely won't be fixed (which I spent too much time). Intel has only Iris Pro OpenCL drivers for Windows. Your kernels, using my proposed method, can use all the features of C ++, including templates, instead of OpenCL and the extended version of C (although I like the extensions). You can be sure that your code does what you want, and not at the whim of any device driver.

The only drawback of my proposed method is that you cannot just install a new driver and optimize it for new hardware. However, VCL already supports the AVX512, so it has already been created for hardware that has not yet been released and will not be replaced for several years. And in any case, in order to maximize the use of your equipment, you will almost certainly have to rewrite your kernel in OpenCL for this equipment - the new driver can only help this way.

Additional information about the SIMD library. You can use expensive SVML with Intel source code for this (this is what Intel OpenCL drivers use when searching for svml after installing Intel OpenCL drivers - do not confuse the SDK with the drivers). Or you can use the free but closed AMD LIBM with source code. However, none of them work on a competitor processor. Agner Fog VCL runs well on both processors, is open source and free.

+3
source share

All Articles