Remove all node s after the specified node

I grab a div of text from a url and would like to remove everything under a paragraph having a backtotop class. I saw a snippet of code with a traverse here on stackoverflow that looks promising, but I can't figure out how to enable it, so @el only contains up to the first p.backtotop in the div.

my code is:

 @doc = Nokogiri::HTML(open(url)) @el = @doc.css("div")[0] end 

scroll snippet:

 doc = Nokogiri::HTML(code) stop_node = doc.css("p.backtotop") doc.traverse do |node| break if node == stop_node # else, do whatever, eg `puts node.name` end 
+4
source share
2 answers

It seems I should choose what I need; without taking things off - see the excellent solution here: Nokogiri: select content between elements A and B

0
source
  • Find the desired div.
  • Find the stop item you want, and then find all of the following siblings.
  • Remove them.

For instance:

 <body> <div id="a"> <h2>My Section</h2> <p class="backtotop">Back to Top</p> <p>More Content</p> <p>Even More Content</p> </div> </body> 
 require 'nokogiri' doc = Nokogiri::HTML(my_html) div = doc.at('#a') div.at('.backtotop').xpath('following-sibling::*').remove puts div #=> <div id="a"> #=> <h2>My Section</h2> #=> <p class="backtotop">Back to Top</p> #=> #=> #=> </div> 

Here's a more complex example where the backtotop element might not be in the root of the div:

 <body> <div id="b"> <h2>Another Section</h2> <section> <p class="backtotop">Back to Top</p> <p>More Content</p> </section> <p>Even More Content</p> </div> </body> 
 require 'nokogiri' doc = Nokogiri::HTML(my_html) div = doc.at('#b') n = div.at('.backtotop') until n==div n.xpath('following-sibling::*').remove n = n.parent end puts div #=> <div id="b"> #=> <h2>Another Section</h2> #=> <section><p class="backtotop">Back to Top</p> #=> #=> </section> #=> </div> 

If your HTML is more complex than the above, please indicate the actual sample along with the desired result. This is good advice for any future question you ask.

+3
source

All Articles