Base64 table design decoding

Question

Base64 table design decoding

I am reading this libb64 source code for encoding and decoding base64 data.

I know the encoding procedure, but I can’t understand how the following decoding table is constructed for quick searching in order to decode the encoded base64 characters. This is the table they use:

static const char decoding[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};

Can someone explain to me how the values in this table are used for decoding purposes.

+5

c base64

cyber_raj Jul 19 '12 at 10:49

source share

3 answers

As you know, each byte has 8 bits, possibly 256 combinations with 2 characters (base2).
With 2 characters, you must drop 8 characters to represent the byte, for example, "01010011".
With base 64, you can imagine 64 combinations with 1 char ...
So we have a base table:
A = 000000
B = 000001
C = 000010
...
If you have the word "Man", then you have bytes:
01001101, 01100001, 01101110
and therefore the stream:
011010110000101101110

Break in a group of six bits: 010011 010110 000101 101110
010011 = T
010110 = W
000101 = F
101110 = u
So, "Man" => base64 encoded = 'TWFu'.
As you can see, this works great for streams with a multiple of 6.

If you have a stream that is not a multiple of 6, such as “Ma,” you have a stream:
010011 010110 0001
you need to fill in groups of 6:
010011 010110 000100
so you have 64 encoded base:
010011 = T
010110 = W
000100 = E
So, "Ma" => "TWE"

After decoding the stream, in this case you need to calculate the last plural length to be a multiple of 8, and so remove the extra bits to get the original stream:
T = 010011
W = 010110
E = 000100
1) 010011 010110 000100
2) 01001101 01100001 00
3) 01001101 01100001 = 'Ma'
In fact, when we put trailing 00s, we mark the end of the Base64 line with '=' to each additional added '00 ('Ma' ==> Base64 'TWE =')

See also link: http://www.base64decode.org/

Images presented on the basis of 64 are a good option for presentation with strings in many applications where it is difficult to work directly with a real binary stream. A real binary stream is better because it is the base of 256, but, for example, complex inside HTML, there are two ways, low traffic or simpler string handling.

Also see ASCII codes, base 64 characters range from '+' to 'z' in the ASCII table, but there are values between '+' and 'z' that are not base 64 characters

'+' = ASCII DEC 43
...
'z' = ASCII DEC 122
from DEC 43 to 122 - 80 values, but
43 OK = '+'
44 is not base 64 characters, so the decoding index is -1 (invalid base64 character)
45 ....
46 ...
...
122 OK = 'z'
make the char needed for decoding, reduced from 43 ('+') to be index 0 on the vector, for quick access by index, so decoding [80] = {62, -1, -1 ....... ., 49, 50.51};

Roberto Novakoski
Developer Systems

+5

Roberto novakosky Jun 15 '13 at 3:33

source share

Given these 2 mapping tables:

 static const char decodingTab[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51}; static unsigned char encodingTab[64]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

decodingTab is the encondingTab reverse mapping table. So decodingTab [i] should never be -1. In fact, only 64 values are expected. However, the size of decodingTab is 128. Thus, in decodingTab, unexpected index values are set to -1 (an arbitrary number that is not in [0.63])

 char c; unsigned char i; ... encoding[decoding[c]]=c; decoding[encoding[i]=i;

Hope it helps.

0

Badara Jan 11 '19 at 17:38

source share

orlp · Accepted Answer · 2012-07-19T10:57:16+0000

This is a shifted and limited ASCII translation table. The key of the table are ASCII values, the values are base64 decoded values. The table is shifted so that index 0 actually mapped to the ASCII + character, and any other indexes display ASCII values after + . The first record in the table, the ASCII + character, is mapped to a base64 value of 62 . Then three characters are ignored (ASCII ,-. ), And the next character is mapped to a base64 value of 63 . The next character is ASCII / .

The rest will become apparent if you look at this table and the ASCII table .

This usage looks something like this:

 int decode_base64(char ch) { if (ch < `+` or ch > `z`) { return SOME_INVALID_CH_ERROR; } /* shift range into decoding table range */ ch -= `+`; int base64_val = decoding[ch]; if (base64_val < 0) { return SOME_INVALID_CH_ERROR; } return base64_val; }

Base64 table design decoding

More articles: