Read / parse Excel files (xls) using Python

Question

Read / parse Excel files (xls) using Python

What is the best way to read Excel files (.xls) with Python files (not CSV ).

Is there a built-in package that is supported by default in Python for this task?

+105

python xls

qrbaQ May 31 '10 at 10:28

source share

9 answers

taleinat · Answer 1 · 2010-05-31 12:24

I highly recommend xlrd for reading .xls files.

Voyager mentioned the use of COM automation. Having done this myself a few years ago, I warn you that this is a real PITA. The amount of caveats is huge, and the documentation is missing and annoying. I came across many strange errors and errors, some of which took many hours to figure out.

UPDATE: For newer .xlsx files, the recommended library for reading and writing looks like openpyxl (thanks, Ikar Pokhorski).

GPB83 · Answer 2 · 2017-05-23 04:04

Using pandas:

 import pandas as pd xls = pd.ExcelFile("yourfilename.xls") sheetX = xls.parse(2) #2 is the sheet number var1 = sheetX['ColumnName'] print(var1[1]) #1 is the row number...

somil · Answer 3 · 2017-04-06 14:15

You can choose any of them http://www.python-excel.org/
I would recommend the Python Xlrd library.

install it with

 pip install xlrd

import using

 import xlrd

to open a book

 workbook = xlrd.open_workbook('your_file_name.xlsx')

open sheet by name

 worksheet = workbook.sheet_by_name('Name of the Sheet')

open sheet by index

 worksheet = workbook.sheet_by_index(0)

read cell value

 worksheet.cell(0, 0).value

Foad · Answer 4 · 2018-06-12 10:35

I think Pandas are the best way. There is already one answer here with pandas using the ExcelFile function, but it did not work for me properly. From here, I discovered the read_excel function that works great:

 import pandas as pd dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name") print(dfs.head(10))

PS To use the read_excel function, read_excel must install xlrd

Esteban Küber · Answer 5 · 2010-05-31 10:46

You can use any of the libraries listed below (for example, Pyxlreader based on JExcelApi, or xlwt ), plus COM to use Excel itself to read files, but for this you imagine Office as a dependency on your software, which may not always be possible .

moi · Answer 6 · 2012-11-25 21:43

You may also consider running (non-python) xls2csv. Submit the xls file and you should return csv.

Hans de Ridder · Answer 7 · 2018-10-28 11:53

For xlsx, I like the solution posted earlier as https://stackoverflow.com/a/166189/ I use only modules from the standard library.

 def xlsx(fname): import zipfile from xml.etree.ElementTree import iterparse z = zipfile.ZipFile(fname) strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')] rows = [] row = {} value = '' for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')): if el.tag.endswith('}v'): value = el.text if el.tag.endswith('}c'): if el.attrib.get('t') == 's': value = strings[int(value)] letter = el.attrib['r'] while letter[-1].isdigit(): letter = letter[:-1] row[letter] = value value = '' if el.tag.endswith('}row'): rows.append(row) row = {} return rows

Improvements added: loading contents by sheet name, using re to get a column, and checking if shared rows are used.

 def xlsx(fname,sheet): import zipfile from xml.etree.ElementTree import iterparse import re z = zipfile.ZipFile(fname) if 'xl/sharedStrings.xml' in z.namelist(): # Get shared strings strings = [element.text for event, element in iterparse(z.open('xl/sharedStrings.xml')) if element.tag.endswith('}t')] sheetdict = { element.attrib['name']:element.attrib['sheetId'] for event,element in iterparse(z.open('xl/workbook.xml')) if element.tag.endswith('}sheet') } rows = [] row = {} value = '' if sheet in sheets: sheetfile = 'xl/worksheets/sheet'+sheets[sheet]+'.xml' #print(sheet,sheetfile) for event, element in iterparse(z.open(sheetfile)): # get value or index to shared strings if element.tag.endswith('}v') or element.tag.endswith('}t'): value = element.text # If value is a shared string, use value as an index if element.tag.endswith('}c'): if element.attrib.get('t') == 's': value = strings[int(value)] # split the row/col information so that the row leter(s) can be separate letter = re.sub('\d','',element.attrib['r']) row[letter] = value value = '' if element.tag.endswith('}row'): rows.append(row) row = {} return rows

Gavin Smith · Answer 8 · 2013-09-18 20:35

For older Excel files, there is an OleFileIO_PL module that can read the OLE structured storage format used.

Jamieson Becker · Answer 9 · 2015-04-08 22:11

Python Excelerator also performs this task. http://ghantoos.org/2007/10/25/python-pyexcelerator-small-howto/

It is also available on Debian and Ubuntu:

  sudo apt-get install python-excelerator

Read / parse Excel files (xls) using Python

More articles: