Remove comments from inner_html

I have code that uses Nokogiri, and I'm trying to get inner_html without getting comments.

 html = Nokogiri::HTML(open(@sql_scripts_url[1])) #using first value of the array html.css('td[class="ms-formbody"]').each do |node| puts node.inner_html # prints comments end 
+8
ruby nokogiri
source share
1 answer

Since you have not provided any sample HTML or the desired output, here is a general solution:

You can select SGML comments in XPath using the comment() node test; you can remove them from the document by calling .remove on all comment nodes. Illustrated:

 require 'nokogiri' doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>') p doc.inner_html #=> "<b>hello</b> <!-- foo --> world" doc.xpath('//comment()').remove p doc.inner_html #=> "<b>hello</b> world" 

Please note that the above modifies the document to remove comments. If you want the original document not to be modified, you can also do this:

 class Nokogiri::XML::Node def inner_html_reject(xpath='.//comment()') dup.tap{ |shadow| shadow.xpath(xpath).remove }.inner_html end end doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>') p doc.inner_html_reject #=> "<r><b>hello</b> world</r>" p doc.inner_html #=> "<r><b>hello</b> <!-- foo --> world</r>" 

Finally, note that if you don't need markup, just a text request itself does not include HTML comments:

 p doc.text #=> "hello world" 
+12
source share

All Articles