What text encoding scheme do you use when you have binary data to send via the ascii channel?

If you have binary data to encode, what encoding scheme do you use?

I know about:

  • Hex encoding . Very simple, but rather verbose, extends one byte to two.
  • Base 64 . The most common, not so verbose, extends three bytes to four.
  • Base 85 . Not general, less verbose, extends four bytes to five.

Are there other general coding schemes? If so, what are the advantages and disadvantages?

Edit : This is useful, for example, when trying to save arbitrary data in a cookie. Cookies can only store text, not arbitrary data, so you need to somehow convert it, preferably with the ability to convert it. Also, suppose you are using a stateless server so that you cannot save state on the server and just put the identifier in a cookie. Of course, if you do this, you will also need some way to verify that what the user is transmitting to you is what you have transmitted to the user, such as a signature.

Also, since the current consensus is that you should use base64 as it is widespread, I will also point out that this is what I use ... I am just wondering if someone used something else, and if so, why.

Edit : just in case someone trips over this, if you want to use Base64 to store data in a cookie, you need to use the modified Base64 implementation . See this answer for this reason.

+7
encoding base64 hex
source share
4 answers

To encode cookie values ​​you need to be careful. See the older answer :

Using cookies version 0, the values ​​must not contain spaces, brackets, parentheses, equal signs , commas, double quotes, slashes, question marks, signs, colons, and semicolons. Empty values ​​may not behave the same in all browsers.

Base64 encoding can generate = characters for certain inputs, and this is not technically allowed in cookies (cookies version 0, in any case, which are most widely supported). In practice, I suspect that = will work fine, but maybe not.

I would suggest that I am absolutely sure that your encoded binary is cookie compatible, then basic hexadecimal encoding is the safest (e.g. in java ).

edit: As @Paul explained, there is a modified version of Base 64 , which is a "Safe URL" (and, I suppose, a "cookie safe"). Consider using a modified version of the standard algorithm rather dilutes its charm.

edit : @shoosh indicated that = is only used to mark the end of a base64 string, so you can trim = , set a cookie, and then turn it on again = again when you need to decode it.

+13
source share

Base64 wins because it is so common that I don’t need to ever worry about folding my own encoder / decoder. I did not work in any applications where I was worried about saving bandwidth or file space in encoded binary data.

+4
source share

UTF-7 once existed. It is officially deprecated, but it still works as ACE (ASCII Compatible Encoding). Now IDN .

+2
source share

Base64 is the de facto standard. Using anything else requires trouble.

+1
source share

All Articles