Getting file system performance / optimization

I am trying to get a hash of a file as quickly as possible. I have a program in which large data arrays (100 GB +) are hashed, consisting of arbitrary file sizes (from a few kilobytes to 5 GB per file) anywhere between several files up to several hundred thousand files.

The program must support all supported Java algorithms (MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512).

I am currently using:

/** * Gets Hash of file. * * @param file String path + filename of file to get hash. * @param hashAlgo Hash algorithm to use. <br/> * Supported algorithms are: <br/> * MD2, MD5 <br/> * SHA-1 <br/> * SHA-256, SHA-384, SHA-512 * @return String value of hash. (Variable length dependent on hash algorithm used) * @throws IOException If file is invalid. * @throws HashTypeException If no supported or valid hash algorithm was found. */ public String getHash(String file, String hashAlgo) throws IOException, HashTypeException { StringBuffer hexString = null; try { MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo)); FileInputStream fis = new FileInputStream(file); byte[] dataBytes = new byte[1024]; int nread = 0; while ((nread = fis.read(dataBytes)) != -1) { md.update(dataBytes, 0, nread); } fis.close(); byte[] mdbytes = md.digest(); hexString = new StringBuffer(); for (int i = 0; i < mdbytes.length; i++) { hexString.append(Integer.toHexString((0xFF & mdbytes[i]))); } return hexString.toString(); } catch (NoSuchAlgorithmException | HashTypeException e) { throw new HashTypeException("Unsuppored Hash Algorithm.", e); } } 

Is there a better way to get a hash of files? I am looking for extreme performance, and I'm not sure that I took this path in the best way.

+3
source share
2 answers

I see a number of potential performance improvements. One of them is to use StringBuilder instead of StringBuffer ; it is compatible with the source code, but is more perfect because it is not synchronized. The second (much more important) is to use the FileChannel and java.nio API instead of the FileInputStream - or at least wrap the FileInputStream in a BufferedInputStream to optimize I / O.

+5
source

In addition to Ernest's answer: MessageDigest.getInstance (validateHashType (hashAlgo)) I think it can be cached in the local stream hash file with validateHashType (hashAlgo) as the key. Creating a MessageDigest takes time, but you can reuse it: by calling the reset () method at the beginning after receiving the instance from the map.

See javadoc java.lang.ThreadLocal

+1
source

All Articles