How to use the Tika XWPFWordExtractorDecorator class?

Someone told me that the Tika XWPFWordExtractorDecorator class is used to convert docx to html. But I'm not sure how to use this class to get HTML from docx. Any other library for doing the same work is also appreciated /

+1
java apache-poi
source share
1 answer

You should not use it directly

Instead, call Tika in the usual way and he will select the appropriate code for you

If you want XHTML to process the file, the code looks something like this:

// Either of these will work, the latter is recommended //InputStream input = new FileInputStream("test.docx"); InputStream input = TikaInputStream.get(new File("test.docx")); // AutoDetect is normally best, unless you know the best parser for the type Parser parser = new AutoDetectParser(); // Handler for indented XHTML StringWriter sw = new StringWriter(); SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); TransformerHandler handler = factory.newTransformerHandler(); handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml"); handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes"); handler.setResult(new StreamResult(sw)); // Call the Tika Parser try { Metadata metadata = new Metadata(); parser.parse(input, handler, metadata, new ParseContext()); String xml = sw.toString(); } finally { input.close(); } 
+4
source share

All Articles