Convert Excel to CSV Effectively in Ruby

I used a spreadsheet gem for this. It works, but can sometimes be very slow. I even tried Roo gem , but that did not improve performance. Is there a better way to do this job? The strange thing is that some worksheets in the same excel are faster, and some worksheets are very slow, even taking up to 1 hour.

Is it possible to use an open office to open each worksheet (tab) in one excel and convert them to csv much faster? If so, how do I do this in ruby?

Or is there an even better solution?

Just adding a small example that I tried with the Roo gem

xls = Roo::Excel.new(source_excel_file) xls.each_with_pagename do |name, sheet| # p sheet.to_csv(File.join(dest_csv_dir,name + ".csv")) #sheet.parse(:clean => true)#.to_csv(File.join(dest_csv_dir,name + ".csv")) puts name puts sheet.parse(:clean => true) end 
+8
ruby ruby-on-rails export-to-csv import-from-excel
source share
4 answers

Cowardly Preface: I am SUPER new to rubies and know almost nothing about rails, but I used to get confused with Excel. I created a mock book on my local machine with 5 sheets, each of which contains 10 columns and 1000 rows of randomly generated numbers. I converted each sheet to my own CSV using this:

 require 'win32ole' require 'csv' # configure a workbook, turn off excel alarms xl = WIN32OLE.new('excel.application') book = xl.workbooks.open('C:\stack\my_workbook.xlsx') xl.displayalerts = false # loop through all worksheets in the excel file book.worksheets.each do |sheet| last_row = sheet.cells.find(what: '*', searchorder: 1, searchdirection: 2).row last_col = sheet.cells.find(what: '*', searchorder: 2, searchdirection: 2).column export = File.new('C:\\stack\\' + sheet.name + '.csv', 'w+') csv_row = [] # loop through each column in each row and write to CSV (1..last_row).each do |xlrow| (1..last_col).each do |xlcol| csv_row << sheet.cells(xlrow, xlcol).value end export << CSV.generate_line(csv_row) csv_row = [] end end # clean up book.close(savechanges: 'false') xl.displayalerts = true xl.quit 

The eyeball test for this script was ~ 30 seconds, each attempt was reached after a few seconds above or below this.

+2
source share

I assume that we are talking about the old Excel (xls) format, it seems that gem cannot work with xlsx anyway.

I would try one of the converters with a distribution extension on the command line: either xls2csv from the catdoc package (very fast, although not all Excel files were processed successfully), or ssconvert from the gnumeric package (moderate speed and requires installation of entrie GNumeric, which is sometimes not an option for the server, but really reliable).

Note. When analyzing Excel, the roulette simply requires a table and wraps it in its own API, so it can never be faster or more reliable than a spreadsheet.

NB2: If I remember correctly (I thought it was many years ago), trying to automate OpenOffice from ruby ​​was a) really complicated and b) very slow.

+2
source share

Make sure you are using updated Roo (1.13.2).

Also make sure you use the patch to skip the final blank lines:

https://github.com/Empact/roo/blob/master/lib/roo/worksheet.rb

If you can publish one of your spreadsheets that take a long time to parse, this can help the people here help you. Just remember to delete sensitive data.

+1
source share
 xls_file = Roo::Excelx.new('test.xlsx') CSV.open('test.csv') do |csv| (2..xls_file.last_row).each do |i| # if you do not need header otherwise (1..xls_file.last_row) csv << a.row(i) end end 
+1
source share

All Articles