How to extract the contents of an OLE container?

I need to open the MS Word file (.doc) and extract its compound files ('[1] CompObj', 'WordDocument', etc.). Something like 7-zip can be used for this manually, but I need to do this programmatically.

I realized that a Word document is an OLE container (hence why 7-zip can be used to view its contents), but I can't figure out how (using C ++):

  • open OLE container
  • extract each compound file and save it to disk

I found several examples of OLE automation (for example, here ), but what I want to do seems to be less common and I did not find specific examples.

If anyone has an idea of ​​the API (?!) And the tutorial for working with OLE, I would be grateful. Also any code samples.

+5
source share
2 answers

It is called Compound Files, part of the structured storage API. You start with StgOpenStorageEx (). It doesn’t buy much for the Word.doc file; the streams themselves have a complex binary format. To really read the contents of the document that you want to use for automation, let Word read the file. This is rarely done in C ++, but this project shows you how to do it.

+4
source

All Articles