What XPath can I use to get all text nodes after the first paragraph of a node and including it?

Question

What XPath can I use to get all text nodes after the first paragraph of a node and including it?

I am new to Nokigiri and Ruby in general.

I want to get the text of all nodes in the document, starting from and up to the first paragraph of node.

I tried the following with XPath but get nothing:

puts page.search("//p[0]/text()[next-sibling::node()]")

This does not work. What do i need to change?

+4

ruby xpath nokogiri

user1895623 Apr 7 '13 at 19:23

source share

2 answers

This works with Nokogiri (which stands on top of libxml2 and supports XPath 1.0 expressions):

 //p[1]//text() | //p[1]/following::text()

Evidence:

 require 'nokogiri' html = '<body><h1>A</h1><p>B <b>C</b></p><p>D <b>E</b></p></body>' doc = Nokogiri.HTML(html) p doc.xpath('//p[1]//text() | //p[1]/following::text()').map(&:text) #=> ["B ", "C", "D ", "E"]

Note that simply selecting the text nodes themselves returns NodeSet Nokogiri::XML::Text , so if you want only their text content, you must map them using the .text (or .content ) methods.

+2

Phrogz Apr 7 '13 at 21:17

source share

Jens erat · Accepted Answer · 2013-04-07T20:06:02+0000

You should find the <p/> node and return all text() nodes, both inside and after. Depending on the capabilities of XPath Nokogiri, use one of these queries:

 //p[1]/(descendant::text() | following::text())

If this does not work, use it instead, which should first find the first paragraph and may be slightly, but probably imperceptibly, slower:

 (//p[1]/descendant::text() | //p[1]/following::text())

Probably an unsupported alternative to XPath 2.0 would be:

 //text()[//p[1] << .]

which means "all text nodes preceded by the first <p/> node in the document."

What XPath can I use to get all text nodes after the first paragraph of a node and including it?

More articles: