When I realized that I needed to create an index for about 50 XHTML pages that could be added / deleted / renamed / moved in the future, I thought: "No problem - I will write a quick index generator using LINQ to XML, since XHTML is definitely considered XML ".
Of course, as soon as I tried to run it, I learned that XLINQ was choking on XHTML objects such as & nbsp ;. I went around it using the following algorithm:
- Read the XHTML file in line.
- Use the search and replace regular expressions in this line to add a section in DOCTYPE that defines all the relevant objects (because I only care about the "title" attribute in the files I read and my output file is not using any objects right now, it just sets them all to empty, but I can add the actual values โโlater).
- Parses the result in an XDocument.
To save the file, I do the opposite:
- Save XDocument in line.
- Separate entity definitions.
- Save to file.
My question is: are there any libraries (especially the built-in .Net) that I can use that will read XHTML files in XDocuments? The code I wrote achieved its goal (to generate the current index and to check the rest of the generator program), and I would prefer not to waste time testing it if someone else wrote and tested the same thing.
Thank you for your time,
Ria.
Edit: Thank you so much; it works! I still need to process the strings a bit when I save XHTML (I think the library wasnโt actually made for this :)), and I had to play a little with the source of the Agility Pack to make it stop indiscriminately CDATA section around internal elements each style attribute (even if it already had one present), but what is the point of Open Source, right?
Ria
source share