I have two processes. One of them writes files to HDFS, and the other downloads these files.
The first process (the one that writes the file) uses:
private void writeFileToHdfs(byte[] sourceStream, Path outFilePath) {
FSDataOutputStream out = null;
try {
out = getFileSystem().create(outFilePath);
out.write(sourceStream);
} catch (Exception e) {
LOG.error("Error while trying to write a file to hdfs", e);
} finally {
try {
if (null != out)
out.close();
} catch (IOException e) {
LOG.error("Could not close output stream to hdfs", e);
}
}
}
The second process reads these files for further processing. When creating a file, it is first created and then filled with content. This process takes time (a few milliseconds, but still), and during this time the second process can pick up the file before it is completely written and closed.
Note that HDFS does not save lock information in namenode - so there is no daemon that can check if a file is locked before accessing it.
I wonder how best to solve this problem.
Here are my thoughts:
, , - . ?