Well, I never thought about that, but I just created test.docx with a header and footer. If you have this docx, you can unzip it to get XML files. For my simple test, this gave:
word/ _rels footer1.xml styles.xml document.xml footnotes.xml stylesWithEffects.xml endnotes.xml header1.xml theme fontTable.xml settings.xml webSettings.xml
Opening word/documents.xml gives you the main problem area. You can see that there are elements where the header and footer are involved. In my simple case, I got:
<w:headerReference w:type="default" r:id="rId7"/> <w:footerReference w:type="default" r:id="rId8"/>
and
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
The whole document is really small, so
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mo="http://schemas.microsoft.com/office/mac/office/2008/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:mv="urn:schemas-microsoft-com:mac:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14"> <w:body> <w:pw:rsidR="009E6E8F" w:rsidRDefault="009E6E8F"/> <w:pw:rsidR="00B53FFA" w:rsidRDefault="00B53FFA"/> <w:pw:rsidR="00B53FFA" w:rsidRDefault="00B53FFA"/><w:pw:rsidR="00B53FFA" w:rsidRDefault="00B53FFA"> <w:r> <w:t>MY BODY</w:t> </w:r> <w:bookmarkStart w:id="0" w:name="_GoBack"/> <w:bookmarkEnd w:id="0"/> </w:p> <w:sectPr w:rsidR="00B53FFA" w:rsidSect="009E6E8F"> <w:headerReference w:type="default" r:id="rId7"/> <w:footerReference w:type="default" r:id="rId8"/> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>"""
Thus, XML manipulation will not be a problem, both in function and in performance, for something of this size. Here is some code your document should receive in python, parsed as an xml tree and saved as docx. I have to go now, so this is not your complete decision, but I think it should get you down well. If you still have problems, I will come back later and see where you are with him.
import zipfile import shutil as su import os import tempfile import xml.etree.cElementTree def get_word_xml(docx_filename): with open(docx_filename, mode='rt') as f: zip = zipfile.ZipFile(f) xml_content = zip.read('word/document.xml') return xml_content def write_and_close_docx (self, xml_content, output_filename): """ Create a temp directory, expand the original docx zip. Write the modified xml to word/document.xml Zip it up as the new docx """ tmp_dir = tempfile.mkdtemp() self.zipfile.extractall(tmp_dir) with open(os.path.join(tmp_dir,'word/document.xml'), 'w') as f: xmlstr = tree.tostring(xml_content, pretty_print=True) f.write(xmlstr)