Read / parse Excel files (xls) using Python

What is the best way to read Excel files (.xls) with Python files (not CSV ).

Is there a built-in package that is supported by default in Python for this task?

+105
python xls
May 31 '10 at 10:28
source share
9 answers

I highly recommend xlrd for reading .xls files.

Voyager mentioned the use of COM automation. Having done this myself a few years ago, I warn you that this is a real PITA. The amount of caveats is huge, and the documentation is missing and annoying. I came across many strange errors and errors, some of which took many hours to figure out.

UPDATE: For newer .xlsx files, the recommended library for reading and writing looks like openpyxl (thanks, Ikar Pokhorski).

+89
May 31 '10 at 12:24
source share

Using pandas:

 import pandas as pd xls = pd.ExcelFile("yourfilename.xls") sheetX = xls.parse(2) #2 is the sheet number var1 = sheetX['ColumnName'] print(var1[1]) #1 is the row number... 
+40
May 23 '17 at 4:04
source share

You can choose any of them http://www.python-excel.org/
I would recommend the Python Xlrd library.

install it with

 pip install xlrd 

import using

 import xlrd 

to open a book

 workbook = xlrd.open_workbook('your_file_name.xlsx') 

open sheet by name

 worksheet = workbook.sheet_by_name('Name of the Sheet') 

open sheet by index

 worksheet = workbook.sheet_by_index(0) 

read cell value

 worksheet.cell(0, 0).value 
+22
Apr 6 '17 at 14:15
source share

I think Pandas are the best way. There is already one answer here with pandas using the ExcelFile function, but it did not work for me properly. From here, I discovered the read_excel function that works great:

 import pandas as pd dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name") print(dfs.head(10)) 

PS To use the read_excel function, read_excel must install xlrd

+6
Jun 12 '18 at 10:35
source share

You can use any of the libraries listed below (for example, Pyxlreader based on JExcelApi, or xlwt ), plus COM to use Excel itself to read files, but for this you imagine Office as a dependency on your software, which may not always be possible .

+1
May 31 '10 at 10:46
source share

You may also consider running (non-python) xls2csv. Submit the xls file and you should return csv.

+1
Nov 25
source share

For xlsx, I like the solution posted earlier as https://stackoverflow.com/a/166189/ I use only modules from the standard library.

 def xlsx(fname): import zipfile from xml.etree.ElementTree import iterparse z = zipfile.ZipFile(fname) strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')] rows = [] row = {} value = '' for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')): if el.tag.endswith('}v'): value = el.text if el.tag.endswith('}c'): if el.attrib.get('t') == 's': value = strings[int(value)] letter = el.attrib['r'] while letter[-1].isdigit(): letter = letter[:-1] row[letter] = value value = '' if el.tag.endswith('}row'): rows.append(row) row = {} return rows 

Improvements added: loading contents by sheet name, using re to get a column, and checking if shared rows are used.

 def xlsx(fname,sheet): import zipfile from xml.etree.ElementTree import iterparse import re z = zipfile.ZipFile(fname) if 'xl/sharedStrings.xml' in z.namelist(): # Get shared strings strings = [element.text for event, element in iterparse(z.open('xl/sharedStrings.xml')) if element.tag.endswith('}t')] sheetdict = { element.attrib['name']:element.attrib['sheetId'] for event,element in iterparse(z.open('xl/workbook.xml')) if element.tag.endswith('}sheet') } rows = [] row = {} value = '' if sheet in sheets: sheetfile = 'xl/worksheets/sheet'+sheets[sheet]+'.xml' #print(sheet,sheetfile) for event, element in iterparse(z.open(sheetfile)): # get value or index to shared strings if element.tag.endswith('}v') or element.tag.endswith('}t'): value = element.text # If value is a shared string, use value as an index if element.tag.endswith('}c'): if element.attrib.get('t') == 's': value = strings[int(value)] # split the row/col information so that the row leter(s) can be separate letter = re.sub('\d','',element.attrib['r']) row[letter] = value value = '' if element.tag.endswith('}row'): rows.append(row) row = {} return rows 
+1
Oct 28 '18 at 11:53
source share

For older Excel files, there is an OleFileIO_PL module that can read the OLE structured storage format used.

0
Sep 18 '13 at 20:35
source share

Python Excelerator also performs this task. http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

It is also available on Debian and Ubuntu:

  sudo apt-get install python-excelerator 
0
Apr 08 '15 at 22:11
source share



All Articles