Another approach (I am surprised that it is not mentioned here) would be to build a table of 256 integers, where each element in the array is the least significant 1 bit for this index. Then for each byte in integers you look at the table.
Something like this (I did not find the time to set it up, this is just to illustrate the illustration):
int bitcount(unsigned x) { static const unsigned char table[256] = { }; for (int i=0; i<sizeof(x); ++i, x >>= 8) { unsigned char r = table[x & 0xff]; if (r) return r + i*8;
The idea with some tabular approaches to this problem is that the if
cost you something in terms of branch prediction, so you should try to reduce them. It also reduces the number of bit shifts. Your approach uses an if
and a shift by bit, and one byte each. (I hope the optimizer can expand the for loop and does not give a comparison / jump for it.) Some other answers have fewer if
than this, but the approach to the table is simple and straightforward. Of course, you should be guided by actual measurements to find out if that matters.
source share