Why not `Encoding.UTF8.GetBytes (Encoding.UTF8.GetString (x)) == x`

In .NET, why is it not true that:

Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(x))

returns the original byte array for an arbitrary byte array x?

This is mentioned in response to another question, but the defendant does not explain why.

+5
source share
4 answers

Character encodings (UTF8, specifically) can take different forms for the same code point.

Therefore, when you convert to a string and vice versa, the actual bytes may represent a different (canonical) form.

see also String.Normalize(NormalizationForm.System.Text.NormalizationForm.FormD)

See also:

Unicode , . , , "αΊ―":

"\u1EAF" 
"\u0103\u0301" 
"\u0061\u0306\u0301" 

, , , Unicode. , .

, ,

+1

-, watbywbarif, , ==, .

(, SequenceEquals() ), . , , - x ​​ UTF-8.

, 1- 0xFF UTF-8. , Encoding.UTF8.GetString(new byte[] { 0xFF })? , U + FFFD, . , , Encoding.UTF8.GetBytes() , 0xFF.

+3

, == . Encoding.UTF8. :

var a = new byte[] { 1 };
var b = new byte[] { 1 };
bool res = a == b;
+1

, Encoding , , , char , byte, . , Encoding, char byte (1 ), char. ( , Encoding char - , Encoding.ASCII char [0, 128).)

, , , (, ), Encoding - char byte, . ( , Encoding s Unicode, Encoding.Unicode Encoding.UTF8.)

, , byte s? , , byte, , , Encoding . Encoding.GetBytes Encoding.GetChars/Encoding.GetString , .

JPEG . , string, . , , , JPEG? , , . , , : ", , ", , JPEG, - .

, . UTF-8 , char 128 , , - , 10xxxxxx, , 110xxxxx, 1110xxxx 11110xxx, "" ( byte, char). , , 10xxxxxx, "", , - . ? , : "- . , ". , Unicode, : .

, byte char, , byte , , string byte, , . .

, , . Encoding char byte. byte char, , . , . .: -)

.NET Framework MIME Base-64, Convert.ToBase64String Convert.FromBase64String.

+1

All Articles