NVIDIA nvcc supports embedded PTX builds within OpenCL code using the "asm" keyword. The designation is similar to the built-in GCC assembly. I am currently using this:
inline uint popcnt(const uint i) { uint n; asm("popc.b32 %0, %1;" : "=r"(n) : "r" (i)); return n; }
Tested and runs on Ubuntu Linux.
If you need more information, check out the NVIDIA oclInlinePTX sample code and the PTX ISA documentation.
If you use an AMD or Intel card, it does not matter, because you can just use the popcount built-in command in OpenCL 1.2.
source share