How can I use Nokogiri to write a HUGE XML file?

I have a Rails application that uses the delayed_job function in a reporting function to run very large reports. One of them generates a massive XML file, and in just a few days a bad, old way of writing code may occur. I thought that by seeing impressive tests on the Internet, Nokigiri might allow us to get some non-trivial performance benefits.

However, the only examples I can find include using Nokogiri Builder to create an xml object, and then using .to_xml to write all of this. But in my zip code there is not enough memory to process this file.

Can I use Nokogiri to stream or write this data to a file?

+7
source share
1 answer

Nokogiri is designed to be built in memory because you create a DOM and convert it to XML on the fly. It is easy to use, but there are trade-offs, and doing it in memory is one of them.

You might want to learn Erubis to generate XML. Instead of collecting all the data before processing and supporting the controller logic, as we did with Rails to save memory, you can put your logic in a template and pass it along your data, which should help with resource requirements,

If you need XML in a file, you may need to do this using redirection:

erubis options templatefile.erb > xmlfile 

This is a very simple example, but it shows that you can easily define a template for generating XML:

 <% asdf = (1..5).to_a %> <xml> <element> <% asdf.each do |i| %> <subelement><%= i %></subelement> <% end %> </element> </xml> 

which when i call erubis test.erb outputs:

 <xml> <element> <subelement>1</subelement> <subelement>2</subelement> <subelement>3</subelement> <subelement>4</subelement> <subelement>5</subelement> </element> </xml> 

EDIT:

String concatenation lasted forever ...

Yes, it may just be due to garbage collection. You are not showing any example code of how you build your lines, but Ruby works better when you use << to add one line to another than using + .

It may also work better so as not to try to save everything in a line, but instead write it immediately to disk, adding to the open file when you go.

Again, without code examples, I shoot in the dark about what you can do and why everything is slow.

+4
source

All Articles