This is difficult to do because Microsoft recommends using the Office engine, which works great on the desktop but is useless for the Internet.
The binary .xls file format is decent, which is why Microsoft introduced the OpenXML .xlsx files in Office 2007.
.xslx supposedly simple - it's just a zip container full of XML files. You can open it with System.IO.Packaging and edit it with System.Xml . There is even a compatibility pack for older versions of the office.
Unfortunately, this is simply not easy - the .xslx format .xslx terrible beyond words .
It looks like they took the 16-bit optimized binary .xls format (originally developed for Windows 3.1) and serialized it instead of XML instead of XML. Then they added really stupid changes, such as the cell comments in fact VML - a format supposedly dropped from IE5! They also added a ton of magic numbers and metadata to XML, so you cannot use any transformations on it, so you take it apart manually.
Finally, they made it a complete donkey for debugging, and we regularly find .xslx files that the compatibility package reports as corrupted (for no reason), but this latest version of Office can open normally.
There is a really good open source library for this: SpreadsheetLight . This is a very good library, but anything that requires you to dig and stretch the .xslx files yourself will be painful.
Keith
source share