I have a directory in which another process unpacks the files.
Our current implementation of Storm reads this directory and selects the oldest file and opens the reader to the file. This reader is held as a field inside the nozzle, so when nextTuple () is called, one line is output from the file. As soon as the spout has finished reading, it closes the reader and opens a new reader to a new file.
To increase throughput, the idea was to have several nozzles that read several files at once, since these nozzles will fight for the same files in the same directory, is there a way to communicate between the nozzles so that they can agree on which read files? (Or have a general manager that distributes files to the nozzles).
Directory and files are stored and read from HDFS.
java apache-storm
Micky
source share