Extracting tables from a docx word document in python

I am trying to extract the contents of tables in a DOCX Word document, and as a boy I am new to xml / xpath.

from docx import *
document = opendocx('someFile.docx')
tableList = document.xpath('/w:tbl')

This throws an XPathEvalError: Undefined error. I am sure this is only the first one to expect when developing a script. Unfortunately, I could not find a tutorial for python-docx .

Could you provide an example of retrieving a table?

+5
source share
2 answers

After a few back and forth, we found out that for this space to work properly, a namespace was needed. The xpath method is a suitable solution, it just needs to skip the document namespace first.

lxml xpath method . .

mgierdal :

tblList = document.xpath('//w: tbl', namespaces = document.nsmap) . , , w: , , - document.nsmap.

+3

docx, python-docx. :

from docx import Document()
document = Document(file_path)

tables = document.tables
0

All Articles