count_leading_zero_bits is often one instruction for which the compiler will provide an inline function. otherwise put it in a loop.
count_trailing_zero_bits can use count_leading_zero_bits (x & -x) or search for debruijn if the first is a loop.
for simplicity, I assume 32-bit values.
int offset_of_zero_bits_over_msb_of_other_value( unsigned width , unsigned value , unsigned other ) { int count = 0; int offset = -1; int last = 1; int lz = count_leading_zero_bits( other ); other |= ((1<<(32-lz2))-1);
The idea of ββthe code is as follows:
val1: 00000000000000000000000010000000 ( 128 ) val2: 01000001000100000000100100001001 ( 1091569929 ) lz1: 24 lz2: 1 val2: 01000001000100000000100011111111 // |= ((1<<(32-lz1))-1); val2: 10111110111011111111011100000000 // = ~val2 val2: 00011110011001111111001100000000 // &= val2>>1 , count = 1 val2: 00001110001000111111000100000000 // &= val2>>1 , count = 2 val2: 00000110000000011111000000000000 // &= val2>>1 , count = 3 val2: 00000010000000001111000000000000 // &= val2>>1 , count = 4 val2: 00000000000000000111000000000000 // &= val2>>1 , count = 5 val2: 00000000000000000011000000000000 // &= val2>>1 , count = 6 val2: 00000000000000000001000000000000 // &= val2>>1 , count = 7 val2: 00000000000000000000000000000000 // &= val2>>1 , count = 8
Thus, at each step, all ranges of zeros, now alone, are hidden on the right. When the value is zero, the number of steps taken is the width of the widest run. At any point, counting the number of trailing zeros will give an offset to the nearest range of at least count zeros.
If at any point the counter exceeds the width, you can stop the iteration. The maximum number of iterations is the width, not the size of the word. You can do this O (log n) width, because you can double the amount of shift in each iteration until you exceed the width.
Here is a DeBruijn search for counting trailing zero bits for 32-bit values.
static const int MultiplyDeBruijnBitPosition[32] = { 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9 }; r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
I noticed that in both examples val1 has only one bit. If so, you can use the DeBruijn trick to search for MSB.