docx4j can be used to convert OpenXML to arbitrary XML through XSLT.
Assuming the xslt and javax.xml.transform.stream templates . StreamResult , you would do something like this:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath)); MainDocumentPart mdp = wordMLPackage.getMainDocumentPart(); // DOM document to input to transform org.w3c.dom.Document doc = XmlUtils.marshaltoW3CDomDocument( mdp.getJaxbElement() ); XmlUtils.transform(doc, xslt, null, result);
However, if all you want to do is convert to XML, then docx4j (and Apache POI, for that matter) are redundant. You can simply use OpenXML4J directly.
Whether converting via XSLT is probably the best approach depends on whether your target XML is documented or data oriented.
If it is document oriented, XSLT is a good approach.
If it is data oriented, you may need to consider content data binding. (There was a different approach called customxml, but the i4i patent farce may make this approach inappropriate if you rely on Word for editing)
source share