In addition to what has already been said, I would like to draw your attention to the fact that many compilers offer built-in popcnt, which can be faster than doing it manually (again, maybe not, be sure to check it out). They have an advantage, perhaps compiling the popcnt operation into one code if it is available in your target architecture (but I heard that they do stupid slow things when they return to the library function), while you will be very lucky if the compiler detects one of the algorithms from the collection of Sean's chests (but it may be).
For msvc, it __ popcnt (and options), for gcc it __builtin_popcount (and options), for OpenCL (itβs good that you didnβt ask for this, but why not throw it away) it popcnt, but you have to include cl_amd_popcnt.
source share