What hash to use for file uniqueness in Java

I am trying to track a set of files that can have the same name and metadata. I would like to use a hash to distinguish and use it as a unique identifier, but I'm not sure which one to use? The files are relatively small (in the 100 kb range), and I would like to be able to hash in less than 10 seconds. Which hash (which is built into Java 1.5) best suits my needs?

+5
source share
4 answers

Please note that such a hash will never be unique, although if you use an efficient one, you will have a very good chance of never encountering it.

(.. - ), MD5 .

, SHA- 100Kb 10 , , SHA-1 , , MD5.

MessageDigest .

.

, jarnbjo , SHA- Java 20 / x86 . 5-10 100 ( ) , 10 . , , , , .

strong, . SHA 1 - , , ​​ Bouncy Castle, , .

( , ):

import java.io.*;
import java.security.MessageDigest;

public class Checksum 
{    
    const string Algorithm = "SHA-1"; // or MD5 etc.

    public static byte[] createChecksum(String filename) throws
       Exception
    {
        InputStream fis =  new FileInputStream(filename);
        try
        {
             byte[] buffer = new byte[1024];
             MessageDigest complete = MessageDigest.getInstance("MD5"); 
             int numRead;
             do 
             {
                 numRead = fis.read(buffer);
                 if (numRead > 0) 
                 {
                     complete.update(buffer, 0, numRead);
                 }
             } while (numRead != -1);
             return complete.digest();
         }
         finally
         {
             fis.close();
         }
     }
}
+15

MessageDigest SHA1:

    MessageDigest messageDigest = MessageDigest.getInstance("SHA1");
    InputStream is = new FileInputStream(aFile);
    int res;

    while ((res = inputStream.read()) != -1) {
        digester.update((byte) res);
    }

    byte[] digest = messageDigest.digest();
+5

SHS1 - , . MD5 , .

0
source

this is how i do it, i think it should work fast, check if it is completed in 10 seconds.

package utils;

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

/**
 * This class used to compute the hash value of any string  
 */
public class MyHasher {
private static final String ALGORITHM = "MD5";
static MessageDigest md = null;

static{
    try {
        md = MessageDigest.getInstance(ALGORITHM);
    } catch (NoSuchAlgorithmException e) {
        MyLogger.error("Can't find implementation of "+ALGORITHM+" algorithm", e);
    }   
}

/**
 * Compute hash value of any string
 * @param arg the string to compute hash value of.
 * @return the hex hash value as a string.
 */
public static String getHash(String arg) {
    md.update(arg.getBytes());
    byte[] hashValue = md.digest();

    return convertToHex(hashValue);
}
/**
 * Converts byte array to the human readable string of hex'es
 * @param data the byte array to convert
 * @return string representation of the hex'es of the byte array
 */
public static String convertToHex(byte[] data){
    StringBuffer buf = new StringBuffer();
    for(int i=0;i<data.length;i++){
        int halfbyte = (data[i]>>>3)&0x0F;
        int two_halfs = 0;
        do{
            if((0<=halfbyte) && (halfbyte <=9))
                buf.append((char) ('0'+halfbyte));
            else
                buf.append((char) ('a'+(halfbyte-10)));
            halfbyte = data[i] & 0x0F;
        }while(two_halfs++ <1);
    }       
    return buf.toString();
}
}
0
source

All Articles