Purging an XML document recursively from empty tags with Nokogiri?

I have a nested XML document that looks like this:

<?xml version="1.0"?> <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> 

I need to remove all empty XML nodes, for example <empty/> and <css/> .

I got something like:

 doc = Nokogiri::XML::DocumentFragment.parse <<-EOXML <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> EOXML phone = doc.css("phone") phone.children.each do | child | child.remove if child.inner_text == '' end 

In the above code, only the first empty tag is removed, for example. <empty/> . I cannot go inside a nested block. I think I need a recursive strategy. I carefully read the Nokogiri documentation and checked many examples, but have not yet found a solution.

How can i fix this?

I am using Ruby 1.9.3 and Nokogiri 1.5.10.

+1
ruby xml recursion nokogiri
source share
4 answers

You can find all nodes without text using xpath "/phone//*[not(text())]" .

 require 'nokogiri' doc = Nokogiri::XML::Document.parse <<-EOXML <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> EOXML doc.xpath("/phone//*[not(text())]").remove puts doc.to_s.gsub(/\n\s*\n/, "\n") #=> <?xml version="1.0"?> #=> <phone> #=> <name>test</name> #=> <descr>description</descr> #=> <lines> #=> <line>12345</line> #=> </lines> #=> </phone> 
+2
source share

Latecomer with a different approach, hoping to add additional insight. This approach eliminates the annoying additional new lines and gives you the ability to save empty fields with attributes with set values.

 require 'nokogiri' doc = Nokogiri::XML::Document.parse <<-EOXML <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> EOXML def traverse_and_clean(kid) kid.children.map { |child| traverse_and_clean(child) } kid.remove if kid.content.blank? end traverse_and_clean(doc) 

Exit

 <?xml version="1.0"?> <phone> <name>test</name> <descr>description</descr> <lines> <line>12345</line> </lines> </phone> 

If you find that you need to have some empty fields that have certain attributes. All you have to do is slightly change the traverse_and_clean method:

 def traverse_and_clean(kid) kid.children.map { |child| traverse_and_clean(child) } kid.remove if kid.content.blank? && kid.attributes.blank? end 
+2
source share
 require 'nokogiri' doc = Nokogiri::XML::Document.parse <<-EOXML <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> EOXML nodes = doc.xpath("//phone//*[not(text())]") nodes.each{|n| n.remove if n.elem? } puts doc 

Exit

 <?xml version="1.0"?> <phone> <name>test</name> <descr>description</descr> <lines> <line>12345</line> </lines> </phone> 
+1
source share

Like @JustinKo only responds with CSS selectors:

 require 'nokogiri' doc = Nokogiri::XML(<<EOT) <?xml version="1.0"?> <phone> <name>test</name> <descr>description</descr> <empty/> <lines> <line>12345</line> <css/> </lines> </phone> EOT doc.search(':empty').remove puts doc.to_xml 

After seeing what he did:

 <?xml version="1.0"?> <phone> <name>test</name> <descr>description</descr> <lines> <line>12345</line> </lines> </phone> 

Nokogiri implements many jQuery selectors, so it's always worth a look at what these extensions can do.

+1
source share

All Articles