Very slow to generate MD5 for a large file using Java

I use Java to generate an MD5 hash for some files. I need to create one MD5 for several files with a total size of about 1 gigabyte. Here is my code:

private String generateMD5(SequenceInputStream inputStream){ if(inputStream==null){ return null; } MessageDigest md; try { int read =0; byte[] buf = new byte[2048]; md = MessageDigest.getInstance("MD5"); while((read = inputStream.read(buf))>0){ md.update(buf,0,read); } byte[] hashValue = md.digest(); return new String(hashValue); } catch (NoSuchAlgorithmException e) { return null; } catch (IOException e) { return null; }finally{ try { if(inputStream!=null)inputStream.close(); } catch (IOException e) { // ... } } 

}

It seems to run forever. How can I make it more efficient?

+7
source share
3 answers

You might want to use the Fast MD5 library. This is much faster than the Java-based MD5 provider and getting a hash is as simple as:

 String hash = MD5.asHex(MD5.getHash(new File(filename))); 

Remember that slow speed can also be caused by slow file input / output.

+18
source

I am rewriting your code using nio, the code is as follows:

 private static String generateMD5(FileInputStream inputStream){ if(inputStream==null){ return null; } MessageDigest md; try { md = MessageDigest.getInstance("MD5"); FileChannel channel = inputStream.getChannel(); ByteBuffer buff = ByteBuffer.allocate(2048); while(channel.read(buff) != -1) { buff.flip(); md.update(buff); buff.clear(); } byte[] hashValue = md.digest(); return new String(hashValue); } catch (NoSuchAlgorithmException e) { return null; } catch (IOException e) { return null; } finally { try { if(inputStream!=null)inputStream.close(); } catch (IOException e) { } } } 

On my machine, it takes about 30 seconds to generate md5 code for a large file, and of course I will also check your code, the result indicates that nio does not improve program performance.

Then, I try to get the time for io and md5 respectively, statistics show that the slow io file is the bottleneck because io takes 5/6 of the time.

Using the Fast MD5 library mentioned by @Sticky, it only takes 15 seconds to generate the md5 code, the improvement is great.

+11
source

Whenever speed occurs and you load a file from a URL and want to calculate its MD5 at the same time (i.e. do not save the file, open and read again to get its MD5), my solution is in stack overflow .site / questions / 11667 / ... may be helpful. It is based on the Bloodwulf code snippet here in this thread (thanks!) And extends it a bit.

0
source

All Articles