Fast rounding of numbers> = 0 to a multiple of a certain power 2

There is a well-known pattern of rounding numbers to the nearest multiple of two. Increase the number by one less than the power of two, and then destroy all the bits below:

power = 1 << i
(n + (power - 1)) & ~(power - 1)

The problem with this pattern for my use case is that 0 is not rounded. The obvious solution is to add a branch, but I would prefer to avoid this because the performance of this code is extremely important.

I avoided this cost in some cases with context-sensitive hacking. Changing the previous condition (x <= FAST_PATH_LIMIT)to (x - 1 <= FAST_PATH_LIMIT - 1)force zero to zero and allows you to process it in a slow way. Unfortunately, the ability to do this is not always available.

I will gladly agree to build assemblies on the platform for a relatively obscure architecture. I just want you to know that there is a better way to do this. A magic trick in assembling C or x86 / ARM would really be useful.

+4
source share
4 answers

If you want the zero and other already rounded degrees of the two to always be rounded, then:

((n | 1) + (power - 1)) & ~(power - 1)

Or, if only for zero

((n | (!n)) + (power - 1)) & ~(power - 1)

Many architectures, such as PPC, do not have branching. (!n)

+2
source

ARM has a CLZ (Count Leading Zeros) command that allows you to do this without a loop. Intel has roughly equivalent BFS (Bit Scan Forward). Or you can quickly prepare the mask.

http://en.wikipedia.org/wiki/Find_first_set

+2

x86 :

mov edx, num
mov eax, 1
xor ebx, ebx     ; EBX = 0 for use in CMOVZ
rep bsr ecx, edx ; get index of highest bit set - if num is 0 ECX would be undefined...  use faster LZCNT if available.
cmovz ecx, ebx   ; ...so set it to 0 if that the case
shl eax, cl      ; get power of 2
cmp eax, edx     ; internally subtract num, which results in negative value (borrow/carry) except if it already a power of 2 or zero
setc cl          ; if negative value(borrow/carry)...
shl eax, cl      ; ...then shift by one to next highest power
; EAX = result

, .

+1

If the range of input values ​​is quite limited, for example 0..255, you can use the search table:

const unsigned char roundup_pow2 [] = {1, 2, 2, 2, 4, 4, 4, 4, // ...
};

unsigned int restricted_roundup_power2 (int v)
{
     if (v >= 0  &&  v <= sizeof roundup_pows)
           return roundup_pow2 [v];
     return 0; // ???
}

The range can be expanded with reuse:

unsigned int roundup_power2 (int v)
{
     if (v >= 0  &&  v <= sizeof roundup_pows)
           return roundup_pow2 [v];
     return 8 + roundup_power2 (v >> 8);
}

Of course, a simple program (on the left as an exercise) can be written to create tabular values, and not to calculate them manually.

0
source

All Articles