Quickly compute log2 for 64 bit integers

An excellent programming resource, the Twiddling Hacks bit, offers ( here ) the following method for computing log2 of a 32-bit integer:

#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n static const char LogTable256[256] = { -1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7) }; unsigned int v; // 32-bit word to find the log of unsigned r; // r will be lg(v) register unsigned int t, tt; // temporaries if (tt = v >> 16) { r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt]; } else { r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v]; } 

and mentions that

The lookup table method uses only about 7 operations to search the log for a 32-bit value. If it is expanded for 64-bit quantities, approximately 9 operations.

but, alas, it does not provide any additional information on how to proceed to expand the algorithm to 64-bit integers.

Any hints about what a 64-bit algorithm looks like?

+42
c bit-manipulation 64bit 32bit-64bit lookup 64-bit
Jul 07 '12 at 15:33
source share
7 answers

The internal functions are really fast, but still not enough for a true cross-platform compiler-independent implementation of log2. So if someone is interested, here is the fastest, branchless, abstract algorithm similar to the processor that I need when I explore this topic myself.

 const int tab64[64] = { 63, 0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5}; int log2_64 (uint64_t value) { value |= value >> 1; value |= value >> 2; value |= value >> 4; value |= value >> 8; value |= value >> 16; value |= value >> 32; return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58]; } 

Part of rounding to the next lower power 2 was taken from Strength-2 Borders and part of getting the number of trailing zeros was taken from BitScan (the code (bb & -bb) should select the rightmost bit, which is set to 1, which is not necessary after we rounded the value to the next power 2).

And a 32-bit implementation, by the way,

 const int tab32[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31}; int log2_32 (uint32_t value) { value |= value >> 1; value |= value >> 2; value |= value >> 4; value |= value >> 8; value |= value >> 16; return tab32[(uint32_t)(value*0x07C4ACDD) >> 27]; } 

As with any other computational method, log2 requires that the input value be greater than zero.

+64
Jul 09 '12 at 15:55
source share
— -

If you use GCC, then a lookup table is not needed.

GCC provides a built-in function for determining the number of leading zeros:

Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of initial 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

So you can define:

 #define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1)) 

and it will work for any unsigned long long int. The result is rounded.

For x86 and AMD64, GCC will compile it into the bsr command, so the solution is very fast (much faster than lookup tables).

Working example :

 #include <stdio.h> #define LOG2(X) ((unsigned) (8*sizeof (unsigned long long) - __builtin_clzll((X)) - 1)) int main(void) { unsigned long long input; while (scanf("%llu", &input) == 1) { printf("log(%llu) = %u\n", input, LOG2(input)); } return 0; } 
+44
Jul 07 '12 at 16:30
source share

I tried to convert Find database 2 from an N-bit integer in O (lg (N)) operations with multiplication and search to 64-bit by crudely forcing the magic number. Needless to say, it took some time.

Then I found Desmond's answer and decided to try its magic number as a starting point. Since I have a 6-core processor, I ran it in parallel, starting with 0x07EDD5E59A4E28C2 / 6. I was surprised to find something right away. Turns off 0x07EDD5E59A4E28C2 / 2.

So here is the code for 0x07EDD5E59A4E28C2, which will save you the offset and subtract:

 int LogBase2(uint64_t n) { static const int table[64] = { 0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, 63 }; n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16; n |= n >> 32; return table[(n * 0x03f6eaf2cd271461) >> 58]; } 
+16
Apr 10 '14 at 22:59
source share

Integer Logarithm with Base-2

Here is what I do for unsigned 64-bit integers. This calculates the gender of the base-2 logarithm, which is equivalent to the index of the most significant bit. This method is curiously fast for large numbers, because it uses an expanded loop that always runs in log₂64 = 6 steps.

Essentially, what he does is subtract gradually smaller squares in the sequence {0 ≤ k ≤ 5: 2 ^ (2 ^ k)} = {2³², 2¹⁶, 2⁸, 2⁴, 2², 2¹} = {4294967296, 65536, 256, 16, 4, 2, 1} and summarizes the indicators k of the subtracted values.

 int uint64_log2(uint64_t n) { #define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; } int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i; #undef S } 

Note that this returns -1 if an invalid input of 0 is given (this is what the initial one checks for -(n == 0) ). If you never expect to call it with n == 0 , you can substitute int i = 0; for the initializer and add assert(n != 0); when entering a function.

Integer Logarithm with Base-10

Locarithms of the integer value Base-10 can be calculated in the same way - with the largest square for the test will be 10¹⁶, because log₁₀2⁶⁴ ≅ 19.2659 ... (Note. This is not the fastest way to execute the logarithm with an integer 10, because it uses integer division, which inherently slower. A faster implementation would be to use a battery with values ​​that grow exponentially and compare with a battery that actually performs a binary search.)

 int uint64_log10(uint64_t n) { #define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); } int i = -(n == 0); S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10); return i; #undef S } 
+8
Jul 15 '14 at 1:59
source share

Here's a fairly compact and fast extension that doesn't use additional time series:

 r = 0; /* If its wider than 32 bits, then we already know that log >= 32. So store it in R. */ if (v >> 32) { r = 32; v >>= 32; } /* Now do the exact same thing as the 32 bit algorithm, except we ADD to R this time. */ if (tt = v >> 16) { r += (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt]; } else { r += (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v]; } 

Here is one built with if s chain, again without additional time series. Maybe not the fastest, though.

  if (tt = v >> 48) { r = (t = tt >> 8) ? 56 + LogTable256[t] : 48 + LogTable256[tt]; } else if (tt = v >> 32) { r = (t = tt >> 8) ? 40 + LogTable256[t] : 32 + LogTable256[tt]; } else if (tt = v >> 16) { r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt]; } else { r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v]; } 
+4
Jul 07 '12 at 15:56
source share

Basically, the algorithm detects which byte contains the most significant 1 bit, and then looks at that byte in the search to find the byte log, and then adds it to the byte position.

Here is a slightly simplified version of the 32-bit algorithm:

 if (tt = v >> 16) { if (t = tt >> 8) { r = 24 + LogTable256[t]; } else { r = 16 + LogTable256[tt]; } } else { if (t = v >> 8) { r = 8 + LogTable256[t]; } else { r = LogTable256[v]; } } 

This is the equivalent 64-bit algorithm:

 if (ttt = v >> 32) { if (tt = ttt >> 16) { if (t = tt >> 8) { r = 56 + LogTable256[t]; } else { r = 48 + LogTable256[tt]; } } else { if (t = ttt >> 8) { r = 40 + LogTable256[t]; } else { r = 32 + LogTable256[ttt]; } } } else { if (tt = v >> 16) { if (t = tt >> 8) { r = 24 + LogTable256[t]; } else { r = 16 + LogTable256[tt]; } } else { if (t = v >> 8) { r = 8 + LogTable256[t]; } else { r = LogTable256[v]; } } } 

I came up with an algorithm for any type of sizes that, in my opinion, are nicer than the original.

 unsigned int v = 42; unsigned int r = 0; unsigned int b; for (b = sizeof(v) << 2; b; b = b >> 1) { if (v >> b) { v = v >> b; r += b; } } 

Note: b = sizeof(v) << 2 sets b to half the number of bits in v. I used multiplication instead instead (simply because I liked it).

You can add a lookup table to this algorithm to speed it up, but this is more of a proof of concept.

+2
Jul 07 2018-12-17T00:
source share

Take this:

 typedef unsigned int uint; typedef uint64_t ulong; uint as_uint(const float x) { return *(uint*)&x; } ulong as_ulong(const double x) { return *(ulong*)&x; } uint log2_fast(const uint x) { return (as_uint((float)x)>>23)-127; } uint log2_fast(const ulong x) { return (uint)((as_ulong((double)x)>>52))-1023; } 

How it works: The input integer x converted to a number with a float and then interpreted as bits. The format with float IEEE stores the exponent in bits 30-23 as an integer with an offset of 127, therefore, shifting it 23 bits to the right and subtracting the offset, we get log2 (x). For a 64-bit integer input, x is cast to double , for which the exponent is in bits 62-52 (a shift of 52 bits to the right), and the offset of the exponent is 1023.

0
Jul 05 '19 at 15:08
source share



All Articles