Removing an element by class name using HTMLAgilityPack C #

I use hmml agility pack to read the contents of my html document into a string, etc. After that, I would like to remove the certian elements in this content by their class, however I am encountering a problem.

My HTML is as follows:

<div id="wrapper"> <div class="maincolumn" > <div class="breadCrumbContainer"> <div class="breadCrumbs"> </div> </div> <div class="seo_list"> <div class="seo_head">Header</div> </div> Content goes here... </div> 

Now I used the xpath selector to get all the content inside and used the InnerHtml property as follows:

  node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']"); if (node != null) { pageContent = node.InnerHtml; } 

From now on, I would like to remove the div with class "breadCrumbContainer", however when using the code below I get the error: "Node" "not found in collection"

  node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']"); node = node.RemoveChild(node.SelectSingleNode("//div[@class='breadCrumbContainer']")); if (node != null) { pageContent = node.InnerHtml; } 

Can someone shed some light on this, please? I am new to Xpath and really new to the HtmlAgility library.

Thanks,

Dave

+6
c # xpath xslt html-agility-pack
source share
2 answers

This is because RemoveChild can only remove a direct child, not a large child. Try instead:

  HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']"); node.ParentNode.RemoveChild(node); 
+10
source share

This is a super simple task for XSLT:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match= "div[@class='breadCrumbContainer' and ancestor::div[@id='wrapper'] ] "/> </xsl:stylesheet> 

when this conversion is applied to the provided XML document (with the addition of another <div> and wrapped in the top <html> element to make it more complex and realistic):

 <html> <div id="wrapper"> <div class="maincolumn" > <div class="breadCrumbContainer"> <div class="breadCrumbs"></div> </div> <div class="seo_list"> <div class="seo_head">Header</div> </div> Content goes here... </div> </div> <div> Something else here </div> </html> 

the desired, correct result is output:

 <html> <div id="wrapper"> <div class="maincolumn"> <div class="seo_list"> <div class="seo_head">Header</div> </div> Content goes here... </div> </div> <div> Something else here </div> </html> 
0
source share

All Articles