Popcnt on OpenCL?

The new NVIDIA GPUs support the __popc (x) instruction, which counts the number of bits set in a 32-bit register.

I 99% OpenCL does not support inline assembler unless it is an extension of the provider core.

1) Does this support AMD hardware? (I do not know about this).

2) For OS X and Linux, how do you grab the NVIDIA middleware language that it compiled so you can embed this?

I figured out how to dump the PTX binary in PyOpenCL, now I just need to figure out how to insert it with the changes.

#create the program self.program = cl.Program(self.ctx, fstr).build() print self.program.BINARIES[0] 
+4
source share
2 answers

NVIDIA nvcc supports embedded PTX builds within OpenCL code using the "asm" keyword. The designation is similar to the built-in GCC assembly. I am currently using this:

 inline uint popcnt(const uint i) { uint n; asm("popc.b32 %0, %1;" : "=r"(n) : "r" (i)); return n; } 

Tested and runs on Ubuntu Linux.

If you need more information, check out the NVIDIA oclInlinePTX sample code and the PTX ISA documentation.

If you use an AMD or Intel card, it does not matter, because you can just use the popcount built-in command in OpenCL 1.2.

+2
source

As far as I know, inside any current version of OpenCL there is no built-in assembly, and there is no way to intercept PTX (or CAL) during the JIT compilation cycle on OS X or Linux.

popc is a hardware instruction in NVIDIA 2.x hardware, but it is emulated in 1.x computing hardware. You can find the code for it in device_functions.h in the CUDA toolkit. You can always implement it as a function in OpenCL at the expense of some speed.

+1
source

All Articles