I am currently developing an OpenCL application for a very heterogeneous set of computers (in particular, using JavaCL). To maximize performance, I want to use the GPU if it is available, otherwise I want to go back to the CPU and use the SIMD instructions. My plan is to implement OpenCL code using vector types, because I understand that this allows processors to vectorize instructions and use SIMD instructions.
However, my question is about using OpenCL implementation. For example. if the computer has an Nvidia GPU installed, I think it’s better to use the Nvidia library, but if there is no GPU, I want to use the Intel library to use the SIMD instructions.
How do I achieve this? Is this done automatically or do I need to turn on all the libraries and implement some logic in order to choose the right one? This seems to be a problem that more people are facing than me.
Update After testing various OpenCL drivers, this is my experience:
Intel : The JVM crashed when the JavaCL tried to invoke it. After the reboot, it did not crash the JVM, but also did not return the devices (I used the Intel I7-CPU). When I compiled the OpenCL code offline, it seemed to be able to do some auto-vectorization, so the Intel compiler seems pretty nice.
Nvidia He refused to install his WHQL drivers because he claimed that I did not have an Nvidia card (this computer has a Geforce GT 330M). when I tried it on another computer, which I managed to go all the way to create the kernel, but on the first run it crashed the drivers (the screen flickered for a while, and Windows 7 stated that it had to restart the drivers). The second performance caused a blue death.
AMD / ATI : refused to install the 32-bit SDK (I tried this since I will use the 32-bit JVM), but the 64-bit SDK worked fine. This is the only driver that I managed to execute the code (after restarting because at first it produced a cryptic error message when compiling). However, it does not seem to be able to do any implicit vectorization, and since I do not have an ATI GPU, I did not get any performance compared to the Java implementation. If I use vector types I can see some improvements though.
TL; DR None of the drivers are ready for commercial use. Maybe I better create a JNI module with C code compiled to use SSE instructions.
cross-platform installation simd opencl distribution
Yrlec
source share