Convert a 128-bit hexadecimal string to a base-36 string

I have a 128-bit hexadecimal number stored in a string (from md5, security is not a problem here) that I would like to convert to a base-36 string. If it was a 64-bit or smaller number, I would convert it to a 64-bit integer and then use the algorithm that I found to convert integers to base 36 strings, but this number is too big for that, so I kind of like a loss for how to approach this. Any recommendations would be appreciated.

Edit: After Roland Illig pointed out the hassle of having 0 / O and 1 / l on the phone and not gaining a lot of data density on the hex, I think I can stay with the hexadecimal. I'm still curious, though, if there is a relatively simple way to convert a hexadecimal string of arbitrary length to a base-36 string.

+6
c ++
source share
4 answers

Base-36 encoding requires 6 bits to store each token. Same as base-64, but not using 28 available tokens. The solution 36 ^ n> = 2 ^ 128 gives n> = log (2 ^ 128) / log (36) or 25 tokens to encode the value.

Base-64 encoding also requires 6 bits; all possible token values ​​are used. The solution 64 ^ n> = 2 ^ 128 gives n> = log (2 ^ 128) / log (64) or 22 tokens for encoding the value.

Calculation of base-36 encoding requires division by powers of 36. There are no simple abbreviations; you need a division algorithm that can work with 128-bit values. Base-64 encoding is much easier to compute, since it is a power of 2. Just take 6 bits at a time and shift by 6, only 22 times, to consume all 128 bits.

Why do you want to use base-36? Standard Base-64 encoders. If you really have a token space limitation (you shouldn't, ASCII rulez), then at least use base-32 encoding. Or any power 2, base-16 is hex.

+6
source share

If the only thing missing is support for unsigned 128-bit integers, here's the solution:

#include <stdio.h> #include <inttypes.h> typedef struct { uint32_t v3, v2, v1, v0; } uint128; static void uint128_divmod(uint128 *out_div, uint32_t *out_mod, const uint128 *in_num, uint32_t in_den) { uint64_t x = 0; x = (x << 32) + in_num->v3; out_div->v3 = x / in_den; x %= in_den; x = (x << 32) + in_num->v2; out_div->v2 = x / in_den; x %= in_den; x = (x << 32) + in_num->v1; out_div->v1 = x / in_den; x %= in_den; x = (x << 32) + in_num->v0; out_div->v0 = x / in_den; x %= in_den; *out_mod = x; } int main(void) { uint128 x = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 }; uint128 result; uint32_t mod; uint128_divmod(&result, &mod, &x, 16); fprintf(stdout, "%08"PRIx32" %08"PRIx32" %08"PRIx32" %08"PRIx32" rest %08"PRIx32"\n", result.v3, result.v2, result.v1, result.v0, mod); return 0; } 

With this function, you can repeatedly calculate the result of mod-36, which leads you to a number encoded as base-36.

+1
source share

If you use C ++ with .NET 4, you can always use the System.Numerics.BigInteger class. You can try calling one of the toString overrides so that you can get to base 36.

Alternatively, look at one of many large integer libraries, for example. Matt McCutchenon C ++ Big Integer Library , although you may need to examine the depth of classes in order to use a user base such as 36.

+1
source share

Two things:
1. It is actually not that difficult to divide the byte string into 36. But if you cannot work hard to implement this, you can use base-32 encoding, which requires 26 bytes instead of 25.
2. If you want to read the result by telephone to people, you absolutely must add a simple checksum to your line, which will cost one or two bytes, but will save you a huge amount of "Chinese whispers" of inconvenient clients.

+1
source share

All Articles