You should not use it directly
Instead, call Tika in the usual way and he will select the appropriate code for you
If you want XHTML to process the file, the code looks something like this:
// Either of these will work, the latter is recommended //InputStream input = new FileInputStream("test.docx"); InputStream input = TikaInputStream.get(new File("test.docx")); // AutoDetect is normally best, unless you know the best parser for the type Parser parser = new AutoDetectParser(); // Handler for indented XHTML StringWriter sw = new StringWriter(); SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); TransformerHandler handler = factory.newTransformerHandler(); handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml"); handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes"); handler.setResult(new StreamResult(sw)); // Call the Tika Parser try { Metadata metadata = new Metadata(); parser.parse(input, handler, metadata, new ParseContext()); String xml = sw.toString(); } finally { input.close(); }
Gagravarr
source share