Fastest way to convert binary to decimal?

I have four unsigned 32-bit integers representing an unsigned 128-bit integer in a small trailing order:

typedef struct { unsigned int part[4]; } bigint_t; 

I would like to convert this number to its decimal string representation and output it to a file.

Right now I am using the bigint_divmod10 function to divide the number by 10, keeping track of the remainder. I call this function several times, outputting the remainder as a number until the number becomes zero. This is pretty slow. Is this the fastest way to do this? If so, is there a smart way to implement this feature that I don't see? I tried looking at GMP get_str.c , but I find it pretty impenetrable.

EDIT: here is the fastest code I could find for the divmod10 function:

 static unsigned uint128_divmod10(uint128 *value) { unsigned int a = value->word[3]; unsigned int b = value->word[2]; unsigned int c = value->word[1]; unsigned int d = value->word[0]; unsigned int diva = a / 5; unsigned int divb = b / 5; unsigned int divc = c / 5; unsigned int divd = d / 5; value->word[3] = diva; value->word[2] = divb; value->word[1] = divc; value->word[0] = divd; unsigned int moda = a - diva*5; unsigned int modb = b - divb*5; unsigned int modc = c - divc*5; unsigned int modd = d - divd*5; unsigned int mod = 0; mod += moda; unsigned int carryb = mod*858993459; mod += modb; if (mod >= 5) { mod -= 5; carryb++; } unsigned int carryc = mod*858993459; mod += modc; if (mod >= 5) { mod -= 5; carryc++; } unsigned int carryd = mod*858993459; mod += modd; if (mod >= 5) { mod -= 5; carryd++; } uint128_add(value, carryd, 0); uint128_add(value, carryc, 1); uint128_add(value, carryb, 2); if (value->word[0] & 1) { mod += 5; } uint128_shift(value, -1); return mod; } 

where the add function is defined as:

 static void uint128_add(uint128 *value, unsigned int k, unsigned int pos) { unsigned int a = value->word[pos]; value->word[pos] += k; if (value->word[pos] < a) { // overflow for (int i=pos+1; i<4; i++) { value->word[i]++; if (value->word[i]) { break; } } } } 
+7
c bigint bignum 128bit
source share
6 answers

It depends on what you do with the numbers. You can compensate for a small loss in space efficiency and a small loss in multipoint arithmetic in exchange for a very efficient decimal and decimal conversion. The key is to do multiprecision arithmetic with a base that has a power of 10 rather than a power of 2.

For example, you can use a base of 10,000, where you pack one digit into a 16-bit word, and you do your arithmetic on digits in 32-bit integers. (If you are on a 64-bit machine, you can double this base and make the base 1,000,000,000.) This kind of code is relatively effective with a time interval, although not as fast as using your own power in two, because you cannot Take advantage of the carry bit on hardware. And you cannot represent as many integers in the same number of bits. But this is a whistle when converting to and from a decimal number, because you can convert individual digits without any long separation.

If you need to represent the full range of numbers from zero to ((1 << 128) - 1) , you can still do this, but add an extra digit so that your numbers are larger.

If you really need extra space / speed (maybe you do a lot of cryptographic 128-bit computing), then the 10 / div / mod simultaneous method is the fastest method I know. The only trick is that if small integers are common, you can deal with them on purpose. (That is, if the three most significant 32-bit words are zero, just use your own division for conversion.)

Is there a smart way to implement this feature that I don't see?

Dave Hanson C Interfaces and Implementations contains a long chapter on multipoint arithmetic. Dividing a large number into one digit is a special case that has this effective implementation:

 int XP_quotient(int n, T z, T x, int y) { int i; unsigned carry = 0; for (i = n - 1; i >= 0; i--) { carry = carry*BASE + x[i]; z[i] = carry/y; carry %= y; } return carry; } 

For a complete understanding, it really helps to have a book, but the source code is still much easier to understand than the GNU source code. And you can easily adapt it to use the base of 10,000 (it currently uses the base of 256).

Summary: if your performance bottleneck is converting to decimal, do multiple-value arithmetic with a base that has a power of 10 . If your computer’s native word size is 32 and you use C code, use £ 10,000 in the 16-bit word.

+4
source share

If your values ​​are mostly less than ULLONG_MAX (18446744073709551615), I will try to use sprintf(buf,"%llu",ullong_val) for them sprintf(buf,"%llu",ullong_val) . I'm sure this is pretty well optimized in the standard library, but parsing the format will take several cycles.

Otherwise, I would create the function bigint_divmod1000000000 (or better the name mod10to9) and use it. This will require 9 times fewer divisions than bigint_divmod10 .

+3
source share

8 bit lookup table. You can have 4 lookup tables of 256 numbers. The first of 0-256 for LSB bytes, the second table is the first table, multiplied by 256, etc.

SO when you need numbers to sum the numbers from the lookup table. When you add, you can add as bunary and go through one pass for each byte to fix owerflows.

Example number 0x12345678 in the first search table is under addres (0x78 = 120) so 0x010200 - this is the first number in the second table under (0x56 = 87) is 0x0202000106 (0x56 in dec 22016) in the third table you will have 0x03040007080702 and under the last letter in 0x12 you have 0x030001090809080808 (this does not correspond to 32-bit arithmetic, but you know everything)

Then we sum these numbers (like binary bumpers) and go through one pass, byte by byte to overflow the code for the loop - it's something like

 s=carry+val[i]; val[i]=val[i]&10 carry=s/10; //you can put last two operations in table 

If we calculate the necessary operations for this.

1. (view in tables and adding) 4 search tables. 16 additions (keep in mind that when you do not need to carry owerflow because they cannot be) 2. one pass at each step 3 operands of 16 steps to go through.

passive upper bound 6 * 16 = 100 operations.

EDIT:

Here is C ++ code and 30% faster than a naive implementation.

 #include <iostream> #include <stdint.h> #include <array> static uint64_t lu[4][256]; constexpr uint64_t lookup_value(uint64_t n) { uint64_t r = 0; uint64_t t = 1; while (n) { uint64_t rem = n % 10; n /= 10; r += rem * t; t *= 256; } return r; } void make_lu() { uint64_t step = 1; for (int j = 0; j < 4; ++j) { uint64_t n = 0; for (int i = 0; i < 256; ++i) { lu[j][i] = lookup_value(n); n += step; } step *= 256; } } struct DivMod { uint8_t div; uint8_t rem; }; static DivMod dm[256]; void make_dm() { for (int i = 0; i < 256; ++i) { dm[i].div = i / 10; dm[i].rem = i % 10; } } void init() { make_lu(); make_dm(); } uint64_t b2d(uint64_t n) { uint64_t r = 0; for (int i = 0; i < 4; ++i) { r += lu[i][(n >> (i * 8)) & 0xff]; } uint64_t r2 = 0; uint64_t of = 0; for (int i = 0; i < 8; ++i) { uint64_t v = ((r >> (i * 8)) & 0xff) + of; DivMod &x = dm[v]; of = x.div; r2 += uint64_t(x.rem) << (i * 8); } return r2; } int main() { init(); uint64_t n; std::cin >> n; std::cout << std::hex << b2d(n) << "\n"; return 0; } 
+1
source share

In the future, instead of embedding the uint128 type, I just used string characters directly. This turned out to be much faster than switching from a string to uint128 and vice versa.

0
source share

The most immediate acceleration will occur when embedding the transformation, and not in calling functions; it can be as simple as marking bigint_divmod10() inline , or profile optimization suggested by your compiler.

-one
source share

I know this question is old, but I want to contribute, since none of them can avoid the division cycle. This one uses pow2, I have not tested the standard, but theoretically it should be faster than any other, and can also be changed in the pow function.

 #include <iostream> #include <cmath> using namespace std; #define MathBintodec(arr,len)({int dec=0;int ci_;for(ci_=len;ci_--;)dec+=arr[ci_]*pow(2,len-ci_-1);dec;}) int main(){ int r[]={1,0,0,1,0,0}; cout<<MathBintodec(r,6)<<endl; } 

Conclusion: 36

-one
source share

All Articles