To convert from naive to compact position, you can iterate over the n-tuple and follow these steps for each position p :
- optionally check that position
p is available - set
p position as busy - subtract from
p number of occupied lower positions - save result to destination n-tuple
You can do this by maintaining an array of n bits for the busy state:
- steps 1, 2 and 4 are calculated in constant time
- step 3 can be efficiently calculated if the array is small, that is: 64 bits.
Here is the implementation:
#include <stdio.h> #include <stdlib.h> /* version for up to 9 positions */ #define BC9(n) ((((n)>>0)&1) + (((n)>>1)&1) + (((n)>>2)&1) + \ (((n)>>3)&1) + (((n)>>4)&1) + (((n)>>5)&1) + \ (((n)>>6)&1) + (((n)>>7)&1) + (((n)>>8)&1)) #define x4(m,n) m(n), m((n)+1), m((n)+2), m((n)+3) #define x16(m,n) x4(m,n), x4(m,(n)+4), x4(m,(n)+8), x4(m,(n)+12) #define x64(m,n) x16(m,n), x16(m,(n)+16), x16(m,(n)+32), x16(m,(n)+48) #define x256(m,n) x64(m,n), x64(m,(n)+64), x64(m,(n)+128), x64(m,(n)+192) static int const bc512[1 << 9] = { x256(BC9, 0), x256(BC9, 256), }; int encode9(int dest[], int src[], int n) { unsigned int busy = 0; for (int i = 0; i < n; i++) { int p = src[i]; unsigned int bit = 1 << p; //if (busy & bit) return 1; // optional validity check busy |= bit; dest[i] = p - bc512[busy & (bit - 1)]; } return 0; } /* version for up to 64 positions */ static inline int bitcount64(unsigned long long m) { m = m - ((m >> 1) & 0x5555555555555555); m = (m & 0x3333333333333333) + ((m >> 2) & 0x3333333333333333); m = (m + (m >> 4)) & 0x0f0f0f0f0f0f0f0f; m = m + (m >> 8); m = m + (m >> 16); m = m + (m >> 16 >> 16); return m & 0x3f; } int encode64(int dest[], int src[], int n) { unsigned long long busy = 0; for (int i = 0; i < n; i++) { int p = src[i]; unsigned long long bit = 1ULL << p; //if (busy & bit) return 1; // optional validity check busy |= bit; dest[i] = p - bitcount64(busy & (bit - 1)); } return 0; } int main(int argc, char *argv[]) { int src[argc], dest[argc]; int cur, max = 0, n = argc - 1; for (int i = 0; i < n; i++) { src[i] = cur = atoi(argv[i + 1]); if (max < cur) max = cur; } if (max < 9) { encode9(dest, src, n); } else { encode64(dest, src, n); } for (int i = 0; i < n; i++) { printf("%d ", dest[i]); } printf("\n"); return 0; }
The main optimization is to implement bitcount() , which you can adapt to your needs, specializing in the actual number of positions. I have published above effective solutions for small numbers up to 9 and large numbers up to 64, but you can create a more effective solution for 12 or 32 positions.
In terms of time complexity, in the general case we still have O (n 2 ) , but for small values of n it actually works in O (n.Log (n)) or better, since the parallel implementation of bitcount() can be reduced to the steps of log (n) or less for n up to 64.
You can look at http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive for inspiration and amazement.
Unfortunately, I'm still looking for ways to use this or a similar trick to decode ...