I had the same requirement, and I selected my directory hash as an MD5 hash of the combined streams of all (not directories) files in the directory. As crozin said in the comments on a similar question , you can use SequenceInputStream to act as a stream that combines the load of other threads. I am using Apache Commons Codec for the MD5 algorithm.
Basically, you recurs through a directory tree by adding FileInputStream instances to Vector for files without directories. Vector , then itโs convenient to use the elements() method to provide the Enumeration that the SequenceInputStream loop should go through. For the MD5 algorithm, this looks like a single InputStream .
It turns out that you need the files presented in the same order every time the hash must be the same with the same inputs. The listFiles() method in File does not guarantee ordering, so I sort by file name.
I did this for files managed by SVN and wanted to avoid hashing hidden SVN files, so I applied a flag to avoid hidden files.
The corresponding base code is given below. (Obviously, this may be "hardened.")
import org.apache.commons.codec.digest.DigestUtils; import java.io.*; import java.util.*; public String calcMD5HashForDir(File dirToHash, boolean includeHiddenFiles) { assert (dirToHash.isDirectory()); Vector<FileInputStream> fileStreams = new Vector<FileInputStream>(); System.out.println("Found files for hashing:"); collectInputStreams(dirToHash, fileStreams, includeHiddenFiles); SequenceInputStream seqStream = new SequenceInputStream(fileStreams.elements()); try { String md5Hash = DigestUtils.md5Hex(seqStream); seqStream.close(); return md5Hash; } catch (IOException e) { throw new RuntimeException("Error reading files to hash in " + dirToHash.getAbsolutePath(), e); } } private void collectInputStreams(File dir, List<FileInputStream> foundStreams, boolean includeHiddenFiles) { File[] fileList = dir.listFiles(); Arrays.sort(fileList, // Need in reproducible order new Comparator<File>() { public int compare(File f1, File f2) { return f1.getName().compareTo(f2.getName()); } }); for (File f : fileList) { if (!includeHiddenFiles && f.getName().startsWith(".")) { // Skip it } else if (f.isDirectory()) { collectInputStreams(f, foundStreams, includeHiddenFiles); } else { try { System.out.println("\t" + f.getAbsolutePath()); foundStreams.add(new FileInputStream(f)); } catch (FileNotFoundException e) { throw new AssertionError(e.getMessage() + ": file should never not be found!"); } } } }
source share