This seems to be a recurring question, but here it goes.
I have HTML that is well formatted (it comes from a controlled source, so this can be considered given). I need to iterate over the contents of the HTML body, look for all the words in the document, make some changes to these words, and save the results.
For example, I have a sample.html file, and I want to run it through my application and product output.html, which exactly matches the original, as well as my changes.
I found the following using HTMLAgilityPack, but all the examples I found look at the attributes of the specified tags - is there a light modification that will look at the content and make my changes?
HtmlDocument HD = new HtmlDocument(); HD.Load (@"e:\test.htm"); var NoAltElements = HD.DocumentNode.SelectNodes("//img[not(@alt)]"); if (NoAltElements != null) { foreach (HtmlNode HN in NoAltElements) { HN.Attributes.Append("alt", "no alt image"); } } HD.Save(@"e:\test.htm");
The above example uses image tags without ALT tags. I want to search for all tags in a <body> file and do something with the content (which may include creating new tags in the process).
A very simple example of what I can do is make the following input:
<html> <head><title>Some Title</title></head> <body> <h1>This is my page</h1> <p>This is a paragraph of text.</p> </body> </html>
and draw a conclusion that takes each word and alternates between uppercase letters and makes it italic:
<html> <head><title>Some Title</title></head> <body> <h1>THIS <em>is</em> MY <em>page</em></h1> <p>THIS <em>is</em> A <em>paragraph</em> OF <em>text</em>.</p> </body> </html>
Ideas, suggestions?
Elie
source share