Extracting tables from a docx word document in python

Question

Extracting tables from a docx word document in python

I am trying to extract the contents of tables in a DOCX Word document, and as a boy I am new to xml / xpath.

from docx import *
document = opendocx('someFile.docx')
tableList = document.xpath('/w:tbl')

This throws an XPathEvalError: Undefined error. I am sure this is only the first one to expect when developing a script. Unfortunately, I could not find a tutorial for python-docx .

Could you provide an example of retrieving a table?

+5

python ms-word xpath docx

mgierdal Aug 17 '11 at 18:27

source share

2 answers

docx, python-docx. :

from docx import Document()
document = Document(file_path)

tables = document.tables

0

abdulsaboor 19 . '19 12:33

Spencer Rathbun · Accepted Answer · 2011-08-18T19:18:26+0000

After a few back and forth, we found out that for this space to work properly, a namespace was needed. The xpath method is a suitable solution, it just needs to skip the document namespace first.

lxml xpath method . .

mgierdal :

tblList = document.xpath('//w: tbl', namespaces = document.nsmap) . , , w: , , - document.nsmap.

Extracting tables from a docx word document in python

More articles: