Understand that almost any answer to this question will depend on the limitations of the doc files you use ...
It seems to me that the first option, if you are going to do this, is to convert them to a simpler format - RTF is a great example, and if you can get them in this format, the O Reilly’s RTF Pocket Guide is a BIG resource for understanding the structure files. Converting files is pretty simple if you can install abiword on a Linux machine. From the command line, you simply run:
abiword --to=rtf some_file_name.doc
Of course, in Ruby, you simply complete these commands.
This merge is more complicated - it will depend on your files. You will need to make a decision for some programmers about whether you intend to combine style sheets in each individual document, font tables, etc. Etc. Etc. The content is just in the middle of this rtf file, but all the semantic and stylish data that you will need to do. There is no “one way” here, simply because it depends on what you want on the other side. Here is the Pocket PC RTF manual - great help - basically you will want to use it to understand the structure of your rtf and decide what you are doing and what you don’t want.
Otherwise, if you just need content with NONE from semantics, you can always convert them to txt files and then combine them. The command is very similar:
abiword --to=txt some_file_name.doc
It's dead simple, it just splits the text, and you can execute it and do with it. But then again, you will lose ALL formatting of any type.
jasonpgignac
source share