Can I use the Html Agility Pack for this?

I could not find any tutorials on my site. I am wondering if I can use the Html Agility Pack and use it to parse a string?

As if i have

string = "<b>Some code </b> 

Is it possible to use the flexibility package to get rid of the <b> tags? All the examples that I have seen so far have been downloaded as html documents.

+6
c # html-agility-pack
source share
2 answers

If it is html, then yes.

 string str = "<b>Some code</b>"; // not sure if needed string html = string.Format("<html><head></head><body>{0}</body></html>", str); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); // look xpath tutorials for how to select elements // select 1st <b> element HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]"); string boldText = bNode.InnerText; 
+8
source share

I don't think this is really the best use of HtmlAgilityPack.

I usually see people trying to parse large amounts of html using regular expressions, and I point them to HtmlAgilityPack, but in this case I think it would be better to use a regular expression.

Roy Osherove has a blog post describing how you can cut all html from a fragment:

Even if you got the correct xpath with the Mika Kolari sample, this will only work for the fragment with the <b> tag in it and it will break if the code changes.

+2
source share

All Articles