There are already pipelines for processing the processed XML form of MS Office. Attach the Office OpenXML Extract and WordprocessingML Process pipelines to your domain. You wonβt get a full upconversion in DocBook that you would come from binary (.doc) MS Word documents, but weβll clean up the XML a bit and you can add your own transformations at the end.
source
share