Java and .NET: Base64 conversion confusion

I'm having trouble converting text to a Base64 string in Java (Android) and .NET (Visual Basic). The simple (readable) form of ASCII characters is perfectly convertible. But when it comes to special characters (characters with a code greater than 128), they create problems for me.

For example, I'm trying to convert a character code whose ASCII value is 65 (character "A").

My Java code is:

char a = 65; String c = String.valueOf(a); byte bt[] = c.getBytes(); String result = Base64.encodeToString(bt, Base64.DEFAULT); 

And my .NET code:

 Dim c As String = Chr(65) Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c)) 

Both of them return the same result: "QQ ==". It is perfectly. But when I try to convert a special character, for example, character code 153. Then it returns different results.

 char a = 153; String c = String.valueOf(a); byte bt[] = c.getBytes(); String result = Base64.encodeToString(bt, Base64.DEFAULT); 

This returns "wpk ="

And my .NET code:

 Dim c As String = Chr(153) Dim result as String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(c)) 

This returns "4oSi"

This is so strange. What is wrong here. I use my own Base64 libraries on both platforms. Is there something wrong with my code?

+6
source share
1 answer

Since the data you are encoding is encrypted data - random data in which any byte can be from 0 to 255 and in its encrypted state does not have a character or text value, you need to process this information as a-years, called it - true binary data. Both Java and .NET have full support for true binary data through their respective byte array primitives.

As you know, base64 encoding is the process of converting true binary data (with a range from 0 to 255) into a slightly larger array of binary data (where each byte is guaranteed to have the same value as the ASCII character for printing somewhere between 32 and 126). Let me call it encoded binary . encoded binary can then be safely converted to text, because virtually every known character set is consistent with the printed ASCII character set (32 to 126).

Thus, the main problem with Java and VB.NET fragments is that you are trying to use text primitives - char and String in Java; A string in VB.NET to store true binary data. Once you do, it's too late. It is not possible to reliably convert it back to byte arrays, because text primitives are simply not designed to safely store and retrieve binary data. For more information on why this is the case, read Absolute Minimum. Every software developer should absolutely, positively need to know about Unicode and character sets (no excuses!)

Fortunately, the fix is ​​simple. For Java, do not use char and String to store binary data. Put the data directly into the byte array. Try the following:

  byte [] bt = new byte[1]; bt[0] = (byte) 153; String result = Base64.encodeToString(bt, Base64.DEFAULT); 

I get mQ ==

The fix is ​​conceptually the same in VB.NET. Do not use String. Use an array of bytes.

  Dim bytes() As Byte = New Byte() {153} Dim result As String = Convert.ToBase64String(bytes) 

Again - the answer is mQ ==

Finally, after encoding, it is great to use Strings. Your characters are in an ASCII subset, and any conversion between a String and byte array will not cause data corruption, since all character sets are consistent with an ASCII subset.

Remember that you will have the same problem in reverse order - decoding. You will decode the byte array, after which you will return to true binary . From now on, data should never be stored as a string - until you are done with it - for example. deciphering it back to the original clear text.

Hope this helps.

+9
source

All Articles