Unescaping XML entities using XmlReader in .NET?

I am trying to unescape an XML entity in a string in .NET (C #), but it seems to me that it is not working correctly.

For example, if I have an AT&T , it should be translated to AT&T .

One way is to use HttpUtility.HtmlDecode (), but for HTML.

I have two questions:

  • Can I use HttpUtility.HtmlDecode () to decode XML objects?

  • How to use XmlReader (or something similar) for this? I tried the following, but always returns an empty string:

     static string ReplaceEscapes(string text) { StringReader reader = new StringReader(text); XmlReaderSettings settings = new XmlReaderSettings(); settings.ConformanceLevel = ConformanceLevel.Fragment; using (XmlReader xmlReader = XmlReader.Create(reader, settings)) { return xmlReader.ReadString(); } } 
+9
xml entities translate
Mar 14 2018-11-11T00:
source share
5 answers

Your solution # 2 may work, but you need to call xmlReader.Read(); (or xmlReader.MoveToContent(); ) before ReadString .

I assume # 1 will also be acceptable, although there are edge cases like ® which is a valid HTML object but not an XML object - what should your unescaper do? Throw an exception as a valid XML parser, or simply return "®", as an HTML parser would?

+8
Mar 14 2018-11-11T00:
source share
— -

HTML escaping and XML are closely related. as you said, HttpUtility has HtmlEncode and HtmlDecode . They will also work with XML, as there are only a few objects that need escaping: < , > , \ , ' and & in HTML and XML.

The disadvantage of using the HttpUtility class is that you need a reference to the System.Web dll, which also contains many other things that you probably don't need.

In particular, for XML, the SecurityElement class has an Escape that will encode but does not have a corresponding Unescape method. Therefore, you have several options:

  • use HttpUtility.HtmlDecode() and put the link with the System.Web link
  • scan your own decoding method, which will take care of special characters (since there are only a few - look at the static SecurityElement constructor in Reflector to see the full list)

  • use a (hacker) solution, for example:

.

  public static string Unescape(string text) { XmlDocument doc = new XmlDocument(); string xml = string.Format("<dummy>{0}</dummy>", text); doc.LoadXml(xml); return doc.DocumentElement.InnerText; } 

Personally, I would use HttpUtility.HtmlDecode() if I already had a link to System.Web , or roll my own if not. I don't like your XmlReader approach, as it is Disposable , which usually indicates that it uses resources that need to be removed, and therefore can be an expensive operation.

+12
Mar 14 2018-11-11T00:
source share

It works:

 using (XmlReader xmlReader = XmlReader.Create(reader, settings)) { if (xmlReader.Read()) { return xmlReader.ReadString(); } } 
+1
Mar 14 '11 at 21:41
source share

I found that the top answer has a slight error if your input text ends with certain space characters, such as carriage returns.

String "Testing & # 10;" loses it, running through the gap.

If you combine the solution in question with the adrianbanks wrapper tag, you will get the following that works.

 public static string UnescapeUnicode(string line) { using (StringReader reader = new StringReader("<a>" + line + "</a>")) { using (XmlReader xmlReader = XmlReader.Create(reader)) { xmlReader.MoveToContent(); return xmlReader.ReadElementContentAsString(); } } } 
+1
May 25 '12 at 03:23
source share

This works as well and has the smallest code:

  public static string DecodeString(string encodedString) { if (string.IsNullOrEmpty(formattedText)) return string.Empty; XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null); if (xtr.Read()) return xtr.ReadString(); throw new Exception("Error decoding xml string : " + encodedString); } 

Update1: hmm, it doesn't seem to work if encodeString is "", then xtr.Read () returns false.

Update 2: workaround added

Update3: this seems to work even better

  public static string DecodeString(string encodedString) { XmlTextReader xtr = new XmlTextReader(encodedString, XmlNodeType.Element, null); xtr.MoveToContent(); return xtr.Value; } 
0
Mar 10 '16 at 14:23
source share



All Articles