How to open and manipulate a Word document / template in Java?

I need to open .doc/.dot/.docx/.dotx (I'm not picky, I just want it to work), a document, parse it for placeholders (or something similar), put my own data, and then returns generated document .doc/.docx/.pdf .

And, besides everything else, I need tools for this to be free.

I was looking around for something to soothe my needs, but I can’t find anything. Tools such as Docmosis, Javadocx, Aspose, etc., are commercial. From what I read, Apache POI has not been able to successfully implement this anywhere (they currently do not have an official developer working on part of Word).

The only thing that looks to do the trick is the OpenOffice UNO API. But this is a pretty big byte for those who have never used this API (like me).

So, if I'm going to jump into this, I need to make sure I'm on the right track.

Can someone give me some advice on this?

+8
java ms-word
source share
5 answers

I know that I wrote this question for a long time, and I said that I would send my decision when I finish. So there it is.

I hope this one day helps someone. This is a complete working class, and all you have to do is put it in your application and put the TEMPLATE_DIRECTORY_ROOT directory with the .docx templates in the root directory.

The use is very simple. You put placeholders (key) in your .docx file, and then transfer the file name and map containing the corresponding key-value pairs for this file.

Enjoy it!

 import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.BufferedReader; import java.io.Closeable; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.URI; import java.util.Deque; import java.util.Enumeration; import java.util.HashMap; import java.util.Iterator; import java.util.LinkedList; import java.util.Map; import java.util.UUID; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; import java.util.zip.ZipOutputStream; import javax.faces.context.ExternalContext; import javax.faces.context.FacesContext; import javax.servlet.http.HttpServletResponse; public class DocxManipulator { private static final String MAIN_DOCUMENT_PATH = "word/document.xml"; private static final String TEMPLATE_DIRECTORY_ROOT = "TEMPLATES_DIRECTORY/"; /* PUBLIC METHODS */ /** * Generates .docx document from given template and the substitution data * * @param templateName * Template data * @param substitutionData * Hash map with the set of key-value pairs that represent * substitution data * @return */ public static Boolean generateAndSendDocx(String templateName, Map<String,String> substitutionData) { String templateLocation = TEMPLATE_DIRECTORY_ROOT + templateName; String userTempDir = UUID.randomUUID().toString(); userTempDir = TEMPLATE_DIRECTORY_ROOT + userTempDir + "/"; try { // Unzip .docx file unzip(new File(templateLocation), new File(userTempDir)); // Change data changeData(new File(userTempDir + MAIN_DOCUMENT_PATH), substitutionData); // Rezip .docx file zip(new File(userTempDir), new File(userTempDir + templateName)); // Send HTTP response sendDOCXResponse(new File(userTempDir + templateName), templateName); // Clean temp data deleteTempData(new File(userTempDir)); } catch (IOException ioe) { System.out.println(ioe.getMessage()); return false; } return true; } /* PRIVATE METHODS */ /** * Unzipps specified ZIP file to specified directory * * @param zipfile * Source ZIP file * @param directory * Destination directory * @throws IOException */ private static void unzip(File zipfile, File directory) throws IOException { ZipFile zfile = new ZipFile(zipfile); Enumeration<? extends ZipEntry> entries = zfile.entries(); while (entries.hasMoreElements()) { ZipEntry entry = entries.nextElement(); File file = new File(directory, entry.getName()); if (entry.isDirectory()) { file.mkdirs(); } else { file.getParentFile().mkdirs(); InputStream in = zfile.getInputStream(entry); try { copy(in, file); } finally { in.close(); } } } } /** * Substitutes keys found in target file with corresponding data * * @param targetFile * Target file * @param substitutionData * Map of key-value pairs of data * @throws IOException */ @SuppressWarnings({ "unchecked", "rawtypes" }) private static void changeData(File targetFile, Map<String,String> substitutionData) throws IOException{ BufferedReader br = null; String docxTemplate = ""; try { br = new BufferedReader(new InputStreamReader(new FileInputStream(targetFile), "UTF-8")); String temp; while( (temp = br.readLine()) != null) docxTemplate = docxTemplate + temp; br.close(); targetFile.delete(); } catch (IOException e) { br.close(); throw e; } Iterator substitutionDataIterator = substitutionData.entrySet().iterator(); while(substitutionDataIterator.hasNext()){ Map.Entry<String,String> pair = (Map.Entry<String,String>)substitutionDataIterator.next(); if(docxTemplate.contains(pair.getKey())){ if(pair.getValue() != null) docxTemplate = docxTemplate.replace(pair.getKey(), pair.getValue()); else docxTemplate = docxTemplate.replace(pair.getKey(), "NEDOSTAJE"); } } FileOutputStream fos = null; try{ fos = new FileOutputStream(targetFile); fos.write(docxTemplate.getBytes("UTF-8")); fos.close(); } catch (IOException e) { fos.close(); throw e; } } /** * Zipps specified directory and all its subdirectories * * @param directory * Specified directory * @param zipfile * Output ZIP file name * @throws IOException */ private static void zip(File directory, File zipfile) throws IOException { URI base = directory.toURI(); Deque<File> queue = new LinkedList<File>(); queue.push(directory); OutputStream out = new FileOutputStream(zipfile); Closeable res = out; try { ZipOutputStream zout = new ZipOutputStream(out); res = zout; while (!queue.isEmpty()) { directory = queue.pop(); for (File kid : directory.listFiles()) { String name = base.relativize(kid.toURI()).getPath(); if (kid.isDirectory()) { queue.push(kid); name = name.endsWith("/") ? name : name + "/"; zout.putNextEntry(new ZipEntry(name)); } else { if(kid.getName().contains(".docx")) continue; zout.putNextEntry(new ZipEntry(name)); copy(kid, zout); zout.closeEntry(); } } } } finally { res.close(); } } /** * Sends HTTP Response containing .docx file to Client * * @param generatedFile * Path to generated .docx file * @param fileName * File name of generated file that is being presented to user * @throws IOException */ private static void sendDOCXResponse(File generatedFile, String fileName) throws IOException { FacesContext facesContext = FacesContext.getCurrentInstance(); ExternalContext externalContext = facesContext.getExternalContext(); HttpServletResponse response = (HttpServletResponse) externalContext .getResponse(); BufferedInputStream input = null; BufferedOutputStream output = null; response.reset(); response.setHeader("Content-Type", "application/msword"); response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\""); response.setHeader("Content-Length",String.valueOf(generatedFile.length())); input = new BufferedInputStream(new FileInputStream(generatedFile), 10240); output = new BufferedOutputStream(response.getOutputStream(), 10240); byte[] buffer = new byte[10240]; for (int length; (length = input.read(buffer)) > 0;) { output.write(buffer, 0, length); } output.flush(); input.close(); output.close(); // Inform JSF not to proceed with rest of life cycle facesContext.responseComplete(); } /** * Deletes directory and all its subdirectories * * @param file * Specified directory * @throws IOException */ public static void deleteTempData(File file) throws IOException { if (file.isDirectory()) { // directory is empty, then delete it if (file.list().length == 0) file.delete(); else { // list all the directory contents String files[] = file.list(); for (String temp : files) { // construct the file structure File fileDelete = new File(file, temp); // recursive delete deleteTempData(fileDelete); } // check the directory again, if empty then delete it if (file.list().length == 0) file.delete(); } } else { // if file, then delete it file.delete(); } } private static void copy(InputStream in, OutputStream out) throws IOException { byte[] buffer = new byte[1024]; while (true) { int readCount = in.read(buffer); if (readCount < 0) { break; } out.write(buffer, 0, readCount); } } private static void copy(File file, OutputStream out) throws IOException { InputStream in = new FileInputStream(file); try { copy(in, out); } finally { in.close(); } } private static void copy(InputStream in, File file) throws IOException { OutputStream out = new FileOutputStream(file); try { copy(in, out); } finally { out.close(); } } } 
+23
source share

Since the docx file is just a zip archive of xml files (plus any binary files for embedded objects such as images), we met this requirement, unzipped the zip file by submitting document.xml to the template engine (we used freemarker ), which makes merge for us and then zips up the output to get a new docx file.

Then the template document is just a regular docx with built-in freemarker expressions / directives and can be edited in Word.

Since (un) zipping can be done using the JDK, and Freemarker is open source, you do not take any license fees, not even for the word itself.

The limitation is that this approach can only generate docx or rtf files, and the output document will have the same file type as the template. If you need to convert the document to another format (for example, PDF), you will have to solve this problem separately.

+4
source share

In the end, I relied on Apache Poi 3.12 and handled paragraphs (separately extracting paragraphs also from tables, headers and footers, and footnotes, since such paragraphs are not returned by XWPFDocument.getParagraphs () ).

Processing code ( ~ 100 lines ) and unit tests here on github .

+3
source share

I was more or less the same as you, I had to change a whole bunch of MS Word merge templates right away. After I figured out a lot about finding a Java solution, I finally installed Visual Studio 2010 Express, which is free and did the job in C #.

0
source share

I recently encountered a similar problem: "The tool that accepts the template" .docx "file processes the file by evaluating the passed parameter and displays the" .docx "file as a result of the process."

Finally, God brought us scriptlet4dox :). The key functions of this product are: 1. groovy code input in the form of scripts in the template file (inserting parameters, etc.) 2. Looping collection elements in the table

and so many other features. but, as I checked, the last commit of the project is done about a year ago, so there is a chance that the project will not be supported for new features and new bug fixes. it's your choice to use it or not.

0
source share

All Articles