How to filter CDATA and get text from HTML?

I want to parse an HTML file using Nokogiri. I can do this, but I only need the text, not CDATA or JavaScript, as my script and div tags are found throughout the file.

+4
source share
1 answer

You can remove all script elements,

doc.search('script').remove 

... and then select all text elements

 doc.xpath('//text()') 

... or just select text elements in div elements

 doc.xpath('//div//text()') 
0
source

All Articles