Create a CSV and upload it to S3 at the end of the background

Question

Create a CSV and upload it to S3 at the end of the background

I give users the ability to download an extremely large amount of data through CSV. To do this, I use Sidekiq and put the task into the background task after it has been initiated. What I did in the background job was to generate a csv containing all the necessary data, saving it in /tmp , and then calling save! on my model, passing the file location to the paperclip attribute, which is then disabled and saved in S3.

All of this works great locally. My problem now is with Heroku and the ability to store files for a short time, depending on what you node. My background job cannot find the tmp file that will be saved due to the way Heroku deals with these files. I guess I'm looking for a better way to do this. If there is a way that everything can be done in memory, that would be awesome. The only problem is that paperclip expects the actual file object as an attribute when saving the model. Here's what my background job looks like:

 class CsvWorker include Sidekiq::Worker def perform(report_id) puts "Starting the jobz!" report = Report.find(report_id) items = query_ranged_downloads(report.start_date, report.end_date) csv = compile_csv(items) update_report(report.id, csv) end def update_report(report_id, csv) report = Report.find(report_id) report.update_attributes(csv: csv, status: true) report.save! end def compile_csv(items) clean_items = items.compact path = File.new("#{Rails.root}/tmp/uploads/downloads_by_title_#{Process.pid}.csv", "w") csv_string = CSV.open(path, "w") do |csv| csv << ["Item Name", "Parent", "Download Count"] clean_items.each do |row| if !row.item.nil? && !row.item.parent.nil? csv << [ row.item.name, row.item.parent.name, row.download_count ] end end end return path end end

I skipped the query method for the radiobuses.

+4

ruby-on-rails csv heroku paperclip sidekiq

John Aug 28 '12 at 18:40

source share

1 answer

willglynn · Accepted Answer · 2012-09-09T05:19:00+0000

I do not think that temporary storage of Heroku files is a problem here. The warnings around this are mainly centered around the facts that a) dynos are ephemeral, so anything you write may disappear without warning; and b) dynos are interchangeable, so having temporary tempfiles is a matter of luck when you have more than one web dino running. However, by no means temporary files just disappear while your worker is working.

One thing that I notice is that you are actually creating two temporary files with the same name:

 > path = File.new("/tmp/filename", "w") => #<File:/tmp/filename> > path.fileno => 3 > CSV.open(path, "w") do |csv| csv << %w(foo bar baz); puts csv.fileno end 4 => nil

You can change the path = line to just set the file name (instead of opening it for writing), and then do update_report open the file name for reading. I did not understand what Paperclip does when you give it an empty, already overwritten file open descriptor, but changing this stream may very well fix the problem.

Alternatively, you can do this in memory instead: generate a CSV as a string and pass it to Paperclip as StringIO. (Paperclip supports some non-file objects, including StringIO, using, for example, Paperclip :: StringioAdapter .) Try something like:

 # returns a CSV as a string def compile_csv(items) CSV.generate do |csv| # ... end end def update_report(report_id, csv) report = Report.find(report_id) report.update_attributes(csv: StringIO.new(csv), status: true) report.save! end

Create a CSV and upload it to S3 at the end of the background

More articles: