Nokogiri find text in paragraphs

I want to replace inner_text in all paragraphs of my XHTML document.

I know that I can get all the text using Nokogiri like this

doc.xpath("//text()")

But I only want to work with text in paragraphs, how can I select all the text in paragraphs without ultimately affecting the existing anchor texts in the links?

#For example : <p>some text <a href="/">This should not be changed</a> another one</p>
+5
source share
1 answer

For text that is an immediate child of a paragraph, use // p / text ()

irb> h = '<p>some text <a href="/">This should not be changed</a> another one</p>'
=> ...
irb> doc = Nokogiri::HTML(h)
=> ...
irb> doc.xpath '//p/text()'
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

For text that is a descendant (direct or not) of a paragraph, use // p // text (). To exclude texts that have an anchor as a parent, you can simply subtract them.

irb> doc.xpath('//p//text()') - doc.xpath('//p//a/text()')
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

, , xpath .

+5

All Articles