Python-docx page number

I am trying to create a python program that can find a specific word in a .docx file and return the page number on which it occurred. So far, while looking at the python-docx documentation, I have not been able to find a way to access the page number or even the footer where the number will be. Is there a way to do this using python-docx or even just python? Or, if not, what would be the best way to do this?

+6
source share
2 answers

The short answer is no, because page breaks are inserted by a rendering engine not defined by the .docx file itself.

<w:lastRenderedPageBreak> XML, , , .

, ( , Word ) , , , Python. python-docx lxml (, w:document/w:body), XPath -, , , , .

Windows MS Office API, - , Word.

python-docx, , ( ). w: lastRenderedPageBreak ; , .

"lastRenderedPageBreak" / " python-docx", /, .

+6

Python-docx:

from docx import Document
fn='1.doc'
document = Document(fn)
pn=1    
import re
for p in document.paragraphs:
    r=re.match('Chapter \d+',p.text)
    if r:
        print(r.group(),pn)
    for run in p.runs:
        if 'w:br' in run._element.xml and 'type="page"' in run._element.xml:
            pn+=1
            print('!!','='*50,pn)
0

All Articles