Apache Tika and file access instead of Java input stream

I want to create a new Tika parser to extract metadata from a file. We are already using Tika, and metadata extraction will be performed sequentially.

I think I ran into this problem / improvement for Tika:

Allow transfer of files or memory buffers to parsers

I have a C ++ console executable that takes a path to an input file and then displays the found metadata, each line consists of name / value pairs.
C ++ code relies on libraries that expect a file path when accessing data. Unable to rewrite this executable in Java. I thought it would be pretty easy to connect it to Tika. But the Tika parser must be in Java, and the Tika parser method, which must be overridden, accepts an open input stream:

void parse (InputStream, ContentHandler, metadata metadata, ParseContext)

Therefore, I assume that the only solution would be to accept the input stream and write it to a temporary file, then process the file that will be written, and then finally clear the file. I don't like messing around with a temporary file, and then you may have to worry about cleaning up temporary files if something goes wrong and it is not deleted.

Does anyone have a clever idea of ​​how to deal with something like this?

+5
source share
2 answers

TikaInputStream, . File InputStream , . , .

Java , , . , , , Parser, InputStream, InputStream .

, InputStream TikaInputStream ( , ), ++.

+5

, ++ Runtime.exec, Process InputStream, . ?

+1

All Articles