With Java: replace string in MS Word file

We need a Java library to replace strings in MS Word files.

Can anyone suggest?

+6
java ms-word
source share
5 answers

While MS Word support in Apache POI is not very good. Downloading and then saving any file other than the main formatting will most likely distort the layout. You should try, although perhaps this works for you.

There are also a number of commercial libraries, but I do not know if there are any of them.

The crappy "solution" that I had to deal with when working with a similar requirement recently was to use the DOCX format, which opens a ZIP container, read an XML document, and then replace my markers with the necessary texts. This works to replace simple bits of text without paragraphs, etc.

private static final String WORD_TEMPLATE_PATH = "word/word_template.docx"; private static final String DOCUMENT_XML = "word/document.xml"; /*....*/ final Resource templateFile = new ClassPathResource(WORD_TEMPLATE_PATH); final ZipInputStream zipIn = new ZipInputStream(templateFile.getInputStream()); final ZipOutputStream zipOut = new ZipOutputStream(output); ZipEntry inEntry; while ((inEntry = zipIn.getNextEntry()) != null) { final ZipEntry outEntry = new ZipEntry(inEntry.getName()); zipOut.putNextEntry(outEntry); if (inEntry.getName().equals(DOCUMENT_XML)) { final String contentIn = IOUtils.toString(zipIn, UTF_8); final String outContent = this.processContent(new StringReader(contentIn)); IOUtils.write(outContent, zipOut, UTF_8); } else { IOUtils.copy(zipIn, zipOut); } zipOut.closeEntry(); } zipIn.close(); zipOut.finish(); 

I'm not proud of it, but it works.

+5
source share

I would suggest the Apache POI library:

http://poi.apache.org/

Looking more - it seems that it has not been updated - Boo! It can be complete enough to do what you need.

+2
source share

Try the following: http://www.dancrintea.ro/doc-to-pdf/

In addition to replacing strings in ms ms files, you can also: - read / write Excel files using the simplified API: getCell (x, y) and setCell (x, y, string) - hide Excel sheets (for example, secondary calculations) - replace images in DOC, ODT and SXW files - and convert:

doc β†’ pdf, html, txt, rtf xls β†’ pdf, html, csv ppt β†’ pdf, swf

0
source share

I would take a look at the Apache POI project. This is what I used to interact with MS documents in the past.

http://poi.apache.org/

0
source share

Thanks to everyone. I will try http://www.dancrintea.ro/doc-to-pdf/

because I need to convert the classic DOC file (binary) and not DOCX (zip format).

0
source share

All Articles