I want to parse an HTML file using Nokogiri. I can do this, but I only need the text, not CDATA or JavaScript, as my script and div tags are found throughout the file.
You can remove all script elements,
doc.search('script').remove
... and then select all text elements
doc.xpath('//text()')
... or just select text elements in div elements
doc.xpath('//div//text()')