Today I needed to parse some data from an xlsx file (Office open XML Spreadsheet). I could just open the files in openoffice and export to csv. However, I will need reimport data from this table later, and I would like to exclude manual operation.
I searched the net for xlsx parsing, and all I found was a stack question with the same question: Analyzing and creating Microsoft Office 2007 files (.docx, .xlsx,. PPTX)
So, I rode on my own.
These are 134 lines of code for parsing and accessing a spreadsheet and 54 lines of code for unit tests. This, of course, is checked only for 1 file that I need, and in addition to how it is used in unit tests, there is currently no documentation. It uses zipfile, minidom, re and unittest, therefore it is completely portable and platform independent.
Since I don't blog, and I have no desire to turn this into a python library for OfficeOpen XML, I got stuck wondering where I should post this code. I solved the problem that I am sure others will receive in the future. So I want to publish my code under the public domain somewhere, so that someone will copy and paste into their application and configure to fix their problem.
The implementation is simple, and here is a quick overview of the features:
workbook = Workbook(filename) # open a file for sheet in workbook: pass # iterate over the worksheets workbook["sheetname"] # access a sheet by name, also possible to do by index from 0 sheet["A1"] # Access cell sheet["A"] # Access column sheet["1"] # Access row cell.value # Cell value - only tested with ints and strings.
Thanks for all the answers. I was about to post it on activation, but the page continued to crash when sending activation mail to me. Therefore, I cannot activate my code in order to publish it.
My second choice was codeproject, and I wrote a good article about the file. Unfortunately, this page crashes when I try to submit my post.
So, I put it on github to view and fork: http://github.com/staale/python-xlsx/tree/master
I don't want to do all the work for hosting a python project, so.
Accepting git answer as this is the only thing that worked for me. And git stones.
Edit: Gah, I lost the whole post in codeproject, and I made such a good record. Screw it, I spent more time trying to share this than it took to code it. Therefore, I urge this to be done for myself, as it is now. If I do not decide to configure it later.