" ...">

Why are HtmlEncode and HtmlDecode not isomorphic in .NET?

I find it amazing and quite annoying.

Example:

Decode(”) => " Encode(") => " 

Relevant classes:

 .NET 4: System.Net.WebUtility .NET 3.5: System.Web.HttpUtility 

I can understand that the webpage can be Unicode, but in my case the output cannot be UTF8.

Is there something (maybe an HtmlWriter class) that could do this without me to reinvent the wheel?

Alternative solution:

 string HtmlUnicodeEncode(string input) { var sb = new StringBuilder(); foreach (var c in input) { if (c > 127) { sb.AppendFormat("&#x{0:X4};", (int)c); } else { sb.Append(c); } } return sb.ToString(); } 
+4
source share
1 answer

It is not possible to create an isomorphic pair of HTML codecs. Consider:

 HtmlDecode("”"”””") -> """"" 

how do you return from """"" to the original string?

HtmlEncode should choose one encoding for " and it is suitable for " as the shortest, most readable alternative. As long as you have Unicode running, this is almost certainly the best choice.

If you do not, this other argument ... advantage ” is that it is a bit more readable than ” , but it only works in HTML (not XML), and you still have to go back to symbolic links for all Unicode characters that have no built-in entity names, so they are less consistent. For a character-referenced encoder, create an XmlTextWriter using ASCII encoding and call writeString .

+8
source

All Articles