Parsing XLS and XLSX files (MS Excel) using Ruby?

Are there any gems capable of parsing XLS and XLSX files? I found Spreadsheet and ParseExcel, but they both do not understand the XLSX format.

+69
ruby excel
Jul 23 '10 at 18:00
source share
10 answers

Just found roo that can do the job - works for my requirements by reading the base spreadsheet.

+52
Oct 18 2018-10-18
source share

I recently needed to parse some Excel files using Ruby. The abundance of libraries and options was confusing, so I wrote a blog post about it.

Here is a table from various Ruby libraries and what they support:

enter image description here

If you care about performance, here's how xlsx libraries are compared: enter image description here

I have sample code for reading xlsx files with every supported library here

Here are some examples for reading xlsx files with several different libraries:

rubyXL

 require 'rubyXL' workbook = RubyXL::Parser.parse './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.sheet_name}" num_rows = 0 worksheet.each do |row| row_cells = row.cells.map{ |cell| cell.value } num_rows += 1 end puts "Read #{num_rows} rows" end 

ROO

 require 'roo' workbook = Roo::Spreadsheet.open './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet}" num_rows = 0 workbook.sheet(worksheet).each_row_streaming do |row| row_cells = row.map { |cell| cell.value } num_rows += 1 end puts "Read #{num_rows} rows" end 

backwaters

 require 'creek' workbook = Creek::Book.new './sample_excel_files/xlsx_500_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row.values num_rows += 1 end puts "Read #{num_rows} rows" end 

simple_xlsx_reader

 require 'simple_xlsx_reader' workbook = SimpleXlsxReader.open './sample_excel_files/xlsx_500000_rows.xlsx' worksheets = workbook.sheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row num_rows += 1 end puts "Read #{num_rows} rows" end 

Here is an example of reading an obsolete xls file using the spreadsheet library:

tables

 require 'spreadsheet' # Note: spreadsheet only supports .xls files (not .xlsx) workbook = Spreadsheet.open './sample_excel_files/xls_500_rows.xls' worksheets = workbook.worksheets puts "Found #{worksheets.count} worksheets" worksheets.each do |worksheet| puts "Reading: #{worksheet.name}" num_rows = 0 worksheet.rows.each do |row| row_cells = row.to_a.map{ |v| v.methods.include?(:value) ? v.value : v } num_rows += 1 end puts "Read #{num_rows} rows" end 
+69
Mar 22 '17 at 13:59 on
source share

roo gem is great for Excel (.xls and .xlsx) and is actively developing.

I agree that the syntax is small and not like ruby. But this can be easily achieved with something like:

 class Spreadsheet def initialize(file_path) @xls = Roo::Spreadsheet.open(file_path) end def each_sheet @xls.sheets.each do |sheet| @xls.default_sheet = sheet yield sheet end end def each_row 0.upto(@xls.last_row) do |index| yield @xls.row(index) end end def each_column 0.upto(@xls.last_column) do |index| yield @xls.column(index) end end end 
+42
Dec 27 '12 at 19:57
source share

I use a creek that uses nokogiri. It is fast. Used 8.3 seconds on a 21x11250 xlsx table on my Macbook Air. Got it working on ruby ​​1.9. 3+. The output format for each row is a hash of the row and column name for the cell contents: {"A1" => "cell", "B1" => "other cell"} The hash does not guarantee that the keys will be in the original column order. https://github.com/pythonicrubyist/creek

dumbass is another great one who uses nokogiri. It is super fast. Used 6.7 seconds on a 21x11250 xlsx table on my Macbook Air. I got to work on ruby ​​2.0. 0+. The output format for each row is an array: ["cell", "other cell"] https://github.com/thirtyseven/dullard

The simple_xlsx_reader mentioned is excellent, a bit slow. Used 91 seconds on a 21x11250 xlsx desk on my Macbook Air. Got it working on ruby ​​1.9. 3+. The output format for each row is an array: ["cell", "other cell"] https://github.com/woahdae/simple_xlsx_reader

Another interesting one is Oxcelix. It uses the SAX bull parser, which is supposedly faster than the nokogiri DOM and SAX parser. Allegedly displays a matrix. I could not get it to work. There were also some problems with rubyzip dependency. Would not recommend this.

In conclusion, the creek seems to be a good choice. Other posts recommend simple_xlsx_parser as it has similar performance.

The dumbass is removed as recommended since it is outdated and people get errors / problems with it.

+24
Jan 04 '14 at 23:59 on
source share

If you are looking for more modern libraries, take a look at the spreadsheet: http://spreadsheet.rubyforge.org/GUIDE_txt.html . I can’t say if it supports XLSX files, but given that it is actively developing, I assume that it is (I am not in Windows or with Office, so I can’t check).

At the moment it looks like roo . It supports XLSX, allows (some) iteration, just using times with cell access. I admit that this is not very good.

In addition, RubyXL can now give you a kind of iteration using the extract_data method, which gives you a 2d array of data that can be easily repeated.

Alternatively, if you are trying to work with XLSX files on Windows, you can use the Ruby Win32OLE library, which allows you to interact with OLE objects, such as those provided by Word and Excel. However , as @PanagiotisKanavos mentions in the comments, this has several serious drawbacks:

  • You must install Excel
  • A new instance of Excel is launched for each document.
  • The consumption of memory and other resources is much greater than what is needed for simple processing of XLSX documents.

But if you decide to use it, you can not display Excel, download the XLSX file and access it through it. I’m not sure if it supports iteration, however, I don’t think it would be too difficult to build around the supplied methods, since this is the full Microsoft OLE API for Excel. Here's the documentation: http://support.microsoft.com/kb/222101 Here's a gem: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/win32ole/rdoc/WIN32OLE.html

Again, the options don't look much better, but I'm afraid there aren't many. It is difficult to parse the file format, which is a black box. And the few who managed to break it did not do it so noticeably. Google Docs closed, and LibreOffice thousands of lines of harry C ++.

+6
Dec 25 2018-12-12T00:
source share

rubyXL gem parses XLSX files perfectly.

+4
Oct. 14 '11 at 18:39
source share

I have been hard at work with both tables and rubyXL in the last couple of weeks, and I have to say that both are great tools. However, one of the areas that suffer is the lack of examples of how to really implement something useful. I am currently creating a crawler and using rubyXL to parse xlsx and Spreadsheet files for anything xls. I hope the code below can serve as a useful example and show how effective these tools are.

 require 'find' require 'rubyXL' count = 0 Find.find('/Users/Anconia/crawler/') do |file| # begin iteration of each file of a specified directory if file =~ /\b.xlsx$\b/ # check if file is xlsx format workbook = RubyXL::Parser.parse(file).worksheets # creates an object containing all worksheets of an excel workbook workbook.each do |worksheet| # begin iteration over each worksheet data = worksheet.extract_data.to_s # extract data of a given worksheet - must be converted to a string in order to match a regex if data =~ /regex/ puts file count += 1 end end end end puts "#{count} files were found" 



 require 'find' require 'spreadsheet' Spreadsheet.client_encoding = 'UTF-8' count = 0 Find.find('/Users/Anconia/crawler/') do |file| # begin iteration of each file of a specified directory if file =~ /\b.xls$\b/ # check if a given file is xls format workbook = Spreadsheet.open(file).worksheets # creates an object containing all worksheets of an excel workbook workbook.each do |worksheet| # begin iteration over each worksheet worksheet.each do |row| # begin iteration over each row of a worksheet if row.to_s =~ /regex/ # rows must be converted to strings in order to match the regex puts file count += 1 end end end end end puts "#{count} files were found" 
+4
Dec 28 '12 at 16:25
source share

I could not find a satisfactory xlsx parser. RubyXL does not make a date type, Roo tried to come up with a number as a date, and both are a mess in both api and code.

So, I wrote simple_xlsx_reader . You should use something else for xls, although this may not be the complete answer you are looking for.

+3
Jan 16 '13 at 17:24
source share

Most online examples, including the author’s website for the Spreadsheet gem, demonstrate reading the entire contents of an Excel file in RAM. This is great if your table is small.

 xls = Spreadsheet.open(file_path) 

For those who work with very large files, the best way is to stream read the contents of the file. The gem of the table supports this - although not well documented at this time (circa 3/2015).

 Spreadsheet.open(file_path).worksheets.first.rows do |row| # do something with the array of CSV data end 

CITE: https://github.com/zdavatz/spreadsheet

+3
Mar 24 '15 at 6:01
source share

RemoteTable library uses roo internally. This makes it easy to read spreadsheets of various formats (XLS, XLSX, CSV, etc., possibly deleted, possibly stored in a zip file, gz, etc.):

 require 'remote_table' r = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls' r.each do |row| puts row.inspect end 

Output:

 {"Class"=>"TWO SEATERS", "Manufacturer"=>"ACURA", "carline name"=>"NSX", "displ"=>"3.0", "cyl"=>"6.0", "trans"=>"Auto(S4)", "drv"=>"R", "bidx"=>"60.0", "cty"=>"17.0", "hwy"=>"24.0", "cmb"=>"20.0", "ucty"=>"19.1342", "uhwy"=>"30.2", "ucmb"=>"22.9121", "fl"=>"P", "G"=>"", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1238.0", "eng dscr"=>"DOHC-VTEC", "trans dscr"=>"2MODE", "vpc"=>"4.0", "cls"=>"1.0"} {"Class"=>"TWO SEATERS", "Manufacturer"=>"ACURA", "carline name"=>"NSX", "displ"=>"3.2", "cyl"=>"6.0", "trans"=>"Manual(M6)", "drv"=>"R", "bidx"=>"65.0", "cty"=>"17.0", "hwy"=>"24.0", "cmb"=>"19.0", "ucty"=>"18.7", "uhwy"=>"30.4", "ucmb"=>"22.6171", "fl"=>"P", "G"=>"", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1302.0", "eng dscr"=>"DOHC-VTEC", "trans dscr"=>"", "vpc"=>"4.0", "cls"=>"1.0"} {"Class"=>"TWO SEATERS", "Manufacturer"=>"ASTON MARTIN", "carline name"=>"ASTON MARTIN VANQUISH", "displ"=>"5.9", "cyl"=>"12.0", "trans"=>"Auto(S6)", "drv"=>"R", "bidx"=>"1.0", "cty"=>"12.0", "hwy"=>"19.0", "cmb"=>"14.0", "ucty"=>"13.55", "uhwy"=>"24.7", "ucmb"=>"17.015", "fl"=>"P", "G"=>"G", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1651.0", "eng dscr"=>"GUZZLER", "trans dscr"=>"CLKUP", "vpc"=>"4.0", "cls"=>"1.0"} 
+2
Jul 01 '13 at 17:35
source share



All Articles