How can I write decoded HTML using HTMLAgilityPack?

I have partial success in my attempt to write HTML to a DOCX file using HTMLAgilityPack and the DOCX library. However, the text that I paste into the .docx file contains html encoded, for example:

La ciudad de Los Ángeles (California) ha sincronizado su red completa de semáforos —casi 4.500—, que cubre una zona de 1.215 kilómetros cuadrados (469 millas cuadradas). Según el diario 

I want this to be more:

 La ciudad de Los Angeles (California) ha sincronizado su red completa de semaforos - casi 4.500 -, que cubre una zona de 1.215 kilometros cuadrados (469 millas cuadradas). Segun el diario 

To show some context, this is the code I use:

 private void ParseHTMLAndConvertBackToDOCX() { List<string> sourceText = new List<string>(); List<string> targetText = new List<string>(); HtmlAgilityPack.HtmlDocument htmlDocSource = new HtmlAgilityPack.HtmlDocument(); HtmlAgilityPack.HtmlDocument htmlDocTarget = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDocSource.OptionFixNestedTags = true; htmlDocTarget.OptionFixNestedTags = true; htmlDocSource.Load(sourceHTMLFilename); htmlDocTarget.Load(targetHTMLFilename); // Popul8 generic list of string with source text lines if (htmlDocSource.DocumentNode != null) { IEnumerable<HtmlAgilityPack.HtmlNode> pNodes = htmlDocSource.DocumentNode.SelectNodes("//text()"); foreach (HtmlNode sText in pNodes) { if (!string.IsNullOrWhiteSpace(sText.InnerText)) { sourceText.Add(sText.InnerText); } } } 

.,.

The most suitable line is undoubtedly

 sourceText.Add(sText.InnerText); 

Should it be anything other than InnerText?

Is it possible something like:

 sourceText.Add(sText.InnerText.Decode()); 

?

Intellisense does not work with this, although the project compiles and runs; trying to see what other options there are, moreover, InnerText for HTMLNode is thus barren; I know there OuterText, InnerHTML and OuterHMTL, though ...

+6
source share
2 answers

Try:

 sourceText.Add(HttpUtility.HtmlDecode(myEncodedString)); 

Examples

+5
source

You can use HtmlEntity.DeEntitize(sText.InnerText) from HTMLAgilityPack.

+10
source

All Articles