What XPath can I use to get all text nodes after the first paragraph of a node and including it?

I am new to Nokigiri and Ruby in general.

I want to get the text of all nodes in the document, starting from and up to the first paragraph of node.

I tried the following with XPath but get nothing:

puts page.search("//p[0]/text()[next-sibling::node()]") 

This does not work. What do i need to change?

+4
source share
2 answers

You should find the <p/> node and return all text() nodes, both inside and after. Depending on the capabilities of XPath Nokogiri, use one of these queries:

 //p[1]/(descendant::text() | following::text()) 

If this does not work, use it instead, which should first find the first paragraph and may be slightly, but probably imperceptibly, slower:

 (//p[1]/descendant::text() | //p[1]/following::text()) 

Probably an unsupported alternative to XPath 2.0 would be:

 //text()[//p[1] << .] 

which means "all text nodes preceded by the first <p/> node in the document."

+4
source

This works with Nokogiri (which stands on top of libxml2 and supports XPath 1.0 expressions):

 //p[1]//text() | //p[1]/following::text() 

Evidence:

 require 'nokogiri' html = '<body><h1>A</h1><p>B <b>C</b></p><p>D <b>E</b></p></body>' doc = Nokogiri.HTML(html) p doc.xpath('//p[1]//text() | //p[1]/following::text()').map(&:text) #=> ["B ", "C", "D ", "E"] 

Note that simply selecting the text nodes themselves returns NodeSet Nokogiri::XML::Text , so if you want only their text content, you must map them using the .text (or .content ) methods.

+2
source

All Articles