Python: create a table of contents with python-docx / lxml

I am trying to automate the creation of .docx files (WordML) using python-docx ( https://github.com/mikemaccana/python-docx ). My current script creates ToC manually with the following loop:

for chapter in myChapters: body.append(paragraph(chapter.text, style='ListNumber')) 

Does anyone know how to use the "built-in word" ToC function that automatically adds an index and also creates paragraph links to individual chapters?

Thank you so much!

+4
python docx wordml python-docx
source share
2 answers

The main problem is that the processed ToC depends on pagination to find out what page number to put for each header. Pagination is a feature provided by the build engine, a very complex piece of software built into the Word client. Writing a page layout engine in Python is probably not a good idea, definitely not the project I plan to undertake in the near future :)

ToC consists of two parts:

  • an element that indicates the placement in ToC, and things like those that include header levels.
  • actual visible ToC content, titles and page numbers with dashed lines connecting them.

Creating an element is fairly simple and relatively small. Creating actual visible content, at least if you need page numbers, requires a Word layout engine.

These are the following options:

  • Just add a tag and a few other bits to indicate that Word ToC needs to be updated. When the document is opened first, a dialog box appears in which the links should be updated. The user clicks "Yes" and "Bob" on your uncle. If the user clicks No, the ToC header appears without content below it, and the ToC can be updated manually.

  • Add a tag and then start the Word client using C # or Visual Basic in the Word Automation library to open and save the file; all fields (including the ToC field) are updated.

  • Do the same on the server side if you have a SharePoint instance or something else that can be done using Word Automation Services.

  • Create an AutoOpen macro in a document that automatically starts updating the field when you open the document. Most likely, many virus scans will not pass and will not work on blocked assemblies of Windows that are common in corporate settings.

Here's a very good set of Eric White screencasts that explains all the hairy details

+10
source share

Sorry for adding comments to the old post, but I think this might be helpful. This is not my solution, but it was found there: https://github.com/python-openxml/python-docx/issues/36 Thanks https://github.com/mustash and https://github.com/scanny

  from docx.oxml.ns import qn from docx.oxml import OxmlElement paragraph = self.document.add_paragraph() run = paragraph.add_run() fldChar = OxmlElement('w:fldChar') # creates a new element fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element instrText = OxmlElement('w:instrText') instrText.set(qn('xml:space'), 'preserve') # sets attribute on element instrText.text = 'TOC \o "1-3" \h \z \u' # change 1-3 depending on heading levels you need fldChar2 = OxmlElement('w:fldChar') fldChar2.set(qn('w:fldCharType'), 'separate') fldChar3 = OxmlElement('w:t') fldChar3.text = "Right-click to update field." fldChar2.append(fldChar3) fldChar4 = OxmlElement('w:fldChar') fldChar4.set(qn('w:fldCharType'), 'end') r_element = run._r r_element.append(fldChar) r_element.append(instrText) r_element.append(fldChar2) r_element.append(fldChar4) p_element = paragraph._p 
+1
source share

All Articles