Single word algorithm

I am developing software for an embedded platform and need a one-word algorithm.

The problem is this: given the large integer represented by a sequence of 32-bit words (there may be many), we must divide it into another 32-bit word, i.e., calculate the quotient (also a large integer) and the remainder (32 bits).

Of course, if I were to develop this algorithm on x86, I could just take the GNU MP, but this library is too large for the embdedde platform. In addition, our processor does not have a hardware integer separator (integer division is performed in software). However, the processor has a fairly fast FPU, so the trick is to use floating point arithmetic where possible.

Any ideas how to implement this?

+4
source share
3 answers

Take a look at this: the algorithm divides the integer a [0..n-1] by one word 'c' using floating point for 64x32-> 32 divisions. The extremities of the private "q" are simply printed in a loop, you can save them in an array if you want. Note that you do not need GMP to run the algorithm - I use it only to compare the results.

#include <gmp.h> // divides a multi-precision integer a[0..n-1] by a single word c void div_by_limb(const unsigned *a, unsigned n, unsigned c) { typedef unsigned long long uint64; unsigned c_norm = c, sh = 0; while((c_norm & 0xC0000000) == 0) { // make sure the 2 MSB are set c_norm <<= 1; sh++; } // precompute the inverse of 'c' double inv1 = 1.0 / (double)c_norm, inv2 = 1.0 / (double)c; unsigned i, r = 0; printf("\nquotient: "); // quotient is printed in a loop for(i = n - 1; (int)i >= 0; i--) { // start from the most significant digit unsigned u1 = r, u0 = a[i]; union { struct { unsigned u0, u1; }; uint64 x; } s = {u0, u1}; // treat [u1, u0] as 64-bit int // divide a 2-word number [u1, u0] by 'c_norm' using floating-point unsigned q = floor((double)sx * inv1), q2; r = u0 - q * c_norm; // divide again: this time by 'c' q2 = floor((double)r * inv2); q = (q << sh) + q2; // reconstruct the quotient printf("%x", q); } r %= c; // adjust the residue after normalization printf("; residue: %x\n", r); } int main() { mpz_t z, quo, rem; mpz_init(z); // this is a dividend mpz_set_str(z, "9999999999999999999999999999999", 10); unsigned div = 9; // this is a divisor div_by_limb((unsigned *)z->_mp_d, mpz_size(z), div); mpz_init(quo); mpz_init(rem); mpz_tdiv_qr_ui(quo, rem, z, div); // divide 'z' by 'div' gmp_printf("compare: Quo: %Zx; Rem %Zx\n", quo, rem); mpz_clear(quo); mpz_clear(rem); mpz_clear(z); return 1; } 
+1
source

Sounds like a classic optimization. Instead of dividing by D multiply by 0x100000000/D , and then divide by 0x100000000 . The latter is just verbal, i.e. Trivial. Calculating the multiplier is a little more complicated, but not much.

See also this detailed article for a more detailed background.

+1
source

I believe the lookup table and Newton Raphson sequential approximation are the canonical choices used by hardware developers (who usually can't afford a gate for a complete hardware break). You can choose the tradeoff between accuracy and lead time.

+1
source

All Articles