Combining Word Documents in Ruby

I have N Word documents (Office 2003) from which I want to create a single Word document by combining all N documents in one order. How do I do this in Ruby? Thank you

These are only documents created in MS Office. I do not use Windows and prefer solutions other than Windows.

EDIT: will it be easy if the documents are odt files and not doc files?

+7
ruby ms-word document
source share
3 answers

The only non-Windows solution I know of is Ruby bindings in POI. After that, the code will really look like this .NET code: Combine Word documents as pages of a single document using VB.NET . key code you need to use Selection.InsertFile so that as much as you like in the order you choose.

To merge an ODT document, see this thread: http://cpanforum.com/threads/9938

+4
source share

There is a whole series of really good articles about the word and ruby ​​at http://rubyonwindows.blogspot.com/search/label/word . Word files are really complex, at least until 2007, so you better automate the word to do this.

+3
source share

Understand that almost any answer to this question will depend on the limitations of the doc files you use ...

It seems to me that the first option, if you are going to do this, is to convert them to a simpler format - RTF is a great example, and if you can get them in this format, the O Reilly’s RTF Pocket Guide is a BIG resource for understanding the structure files. Converting files is pretty simple if you can install abiword on a Linux machine. From the command line, you simply run:

 abiword --to=rtf some_file_name.doc 

Of course, in Ruby, you simply complete these commands.

This merge is more complicated - it will depend on your files. You will need to make a decision for some programmers about whether you intend to combine style sheets in each individual document, font tables, etc. Etc. Etc. The content is just in the middle of this rtf file, but all the semantic and stylish data that you will need to do. There is no “one way” here, simply because it depends on what you want on the other side. Here is the Pocket PC RTF manual - great help - basically you will want to use it to understand the structure of your rtf and decide what you are doing and what you don’t want.

Otherwise, if you just need content with NONE from semantics, you can always convert them to txt files and then combine them. The command is very similar:

 abiword --to=txt some_file_name.doc 

It's dead simple, it just splits the text, and you can execute it and do with it. But then again, you will lose ALL formatting of any type.

0
source share

All Articles