I use xlrd to process .xls files and openpyxl to process .xlsx files, and this works well.
Then the .xls file is transferred to me, so I try xlrd.open_workbook() and get:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'
I am considering this question, and I assume that my file, although ending with the extension .xls, should actually be .xlsx. And indeed, I can view it in a text editor:
<?xml version="1.0" encoding="UTF-8"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40"> : : :
(for privacy reasons, I cannot publish the whole file, but it is probably not required for our analysis).
So, I assume that if I just copy ( cp ) it to .xlsx, I should open it with openpyxl.load_workbook() , but I get:
BadZipfile: File is not a zip file
If it is actually xls (unlikely), but cannot be opened using xlrd , and if it is actually xlsx, but cannot be opened using openpyxl , even after I cp it is equal to a. xlsx what to do?
Note. If I open .xls in Excel, save it as .xlsx and try again using openpyxl , it will actually load, but this manual step is not a luxury that I will have in running my program.