OpenCL Newsletter

I am currently developing an OpenCL application for a very heterogeneous set of computers (in particular, using JavaCL). To maximize performance, I want to use the GPU if it is available, otherwise I want to go back to the CPU and use the SIMD instructions. My plan is to implement OpenCL code using vector types, because I understand that this allows processors to vectorize instructions and use SIMD instructions.

However, my question is about using OpenCL implementation. For example. if the computer has an Nvidia GPU installed, I think it’s better to use the Nvidia library, but if there is no GPU, I want to use the Intel library to use the SIMD instructions.

How do I achieve this? Is this done automatically or do I need to turn on all the libraries and implement some logic in order to choose the right one? This seems to be a problem that more people are facing than me.

Update After testing various OpenCL drivers, this is my experience:

  • Intel : The JVM crashed when the JavaCL tried to invoke it. After the reboot, it did not crash the JVM, but also did not return the devices (I used the Intel I7-CPU). When I compiled the OpenCL code offline, it seemed to be able to do some auto-vectorization, so the Intel compiler seems pretty nice.

  • Nvidia He refused to install his WHQL drivers because he claimed that I did not have an Nvidia card (this computer has a Geforce GT 330M). when I tried it on another computer, which I managed to go all the way to create the kernel, but on the first run it crashed the drivers (the screen flickered for a while, and Windows 7 stated that it had to restart the drivers). The second performance caused a blue death.

  • AMD / ATI : refused to install the 32-bit SDK (I tried this since I will use the 32-bit JVM), but the 64-bit SDK worked fine. This is the only driver that I managed to execute the code (after restarting because at first it produced a cryptic error message when compiling). However, it does not seem to be able to do any implicit vectorization, and since I do not have an ATI GPU, I did not get any performance compared to the Java implementation. If I use vector types I can see some improvements though.

TL; DR None of the drivers are ready for commercial use. Maybe I better create a JNI module with C code compiled to use SSE instructions.

+6
cross-platform installation simd opencl distribution
source share
3 answers

First, try to understand hosts and devices: http://www.streamcomputing.eu/blog/2011-07-14/basic-concept-hosts-and-devices/

Basically, you can just do what you described: check if any driver is available, and if not, try the following. What you choose first depends entirely on your own preferences. I would choose the device on which I tested my kernel. In JavaCL, you can choose the fastest device with JavaCL.createBestContext and CLPlatform.getBestDevice, check the host code here: http://ochafik.com/blog/?p=501

Be aware that NVidia does not support processors through its driver; only AMD and Intel. In addition, it’s slightly more difficult to configure multiple devices (for example, 2 GPUs and processors).

+4
source share

There is no API that provides what you want. however, you can do the following:

I suggest that you iterate over the clGetPlatformID identifiers and ask for the number of devices (clGetDeviceID) and device type for each device; and select a platform that has both types. then build a map in u'r code, which displays for each type a list of the platforms it supports, sorted in some way. finally, just get the first element in the list corresponding to CL_DEVICE_TYPE_CPU, and the first element corresponding to CL_DEVICE_TYPE_GPU. if both returned results are equal (platform_cpu == platform_gpu), select one of them and use it for both.

if there is a platform that supports both, you will get a match, as before, since you received the order lists. then u can also perform load balancing if u like on the same platform, for example, that of Intel.

+2
source share

Sorry for being late to the party, but regarding the behavior of Intel's implementation in JavaCL, I'm afraid you were bitten by the JavaCL error:

https://github.com/ochafik/nativelibs4java/issues/297

Fixed in JavaCL 1.0.0-RC2 !

Greetings

+1
source share

All Articles