Optimize database read and write to csv file

I am trying to read a large number of cells from a database (over 100,000) and write them to a csv file on a Ubuntu VPS server. It happens that the server does not have enough memory.

I thought about reading 5000 lines at once and writing them to a file, then reading another 5000, etc.

How can I rebuild the current code so that the memory is not completely consumed?

Here is my code:

def write_rows(emails) File.open(file_path, "w+") do |f| f << "email,name,ip,created\n" emails.each do |l| f << [l.email, l.name, l.ip, l.created_at].join(",") + "\n" end end end 

The function is called from sidekiq worker:

 write_rows(user.emails) 

Thanks for the help!

+6
source share
1 answer

The problem is that when emails.each called emails.each ActiveRecord loads all the records from the database and stores them in memory, to avoid this, you can use the find_each method:

 require 'csv' BATCH_SIZE = 5000 def write_rows(emails) CSV.open(file_path, 'w') do |csv| csv << %w{email name ip created} emails.find_each do |email| csv << [email.email, email.name, email.ip, email.created_at] end end end 

By default, find_each loads records in batches of 1000 at a time, if you want to load batches of 5000 records, you need to pass the option :batch_size to find_each :

 emails.find_each(:batch_size => 5000) do |email| ... 

More information on the find_each method (and the corresponding find_in_batches ) can be found in Ruby on Rails Guides .

I used the CSV class to write a file instead of manually combining fields and lines. This is not a performance optimization since the file entry should not be the bottleneck here.

+4
source

All Articles