Here is my modified code as suggested by @gkuzmin:
public static BigInteger hashStringConcatenation(String str1, String str2) { BigInteger bA = BigInteger.ZERO, bB = BigInteger.ZERO; StringBuffer codeA = new StringBuffer(), codeB = new StringBuffer(); for(int i=0; i<str1.length(); i++) { codeA.append(str1.codePointAt(i)); } for(int i=0; i<str2.length(); i++) { codeB.append(str2.codePointAt(i)); } bA = new BigInteger(codeA.toString()); bB = new BigInteger(codeB.toString()); return bA.multiply(bB).mod(BigInteger.valueOf(2).pow(1024)); }
Note that as a result, I now multiply bA by bB instead of adding.
In addition, the @gkuzmin function has been added, which offers a test function:
public static void breakTest2() { String firstString=new StringBuffer().append((char)11).append((char)111).toString(); String secondString=new StringBuffer().append((char)111).append((char)11).toString(); BigInteger hash1 = hashStringConcatenation(firstString,"arbitrary_string"); BigInteger hash2 = hashStringConcatenation(secondString,"arbitrary_string"); System.out.println("Is hash equal: "+hash1.equals(hash2)); System.out.println("Conflicted values: {"+firstString+"},{"+secondString+"}"); }
and another test with strings having only numeric values:
public static void breakTest1() { Hashtable<String,String> seenTable = new Hashtable<String,String>(); for (int i=0; i<100; i++) { for(int j=i+1; j<100; j++) { String hash = hashStringConcatenation(""+i, ""+j).toString(); if(seenTable.contains(hash)) { System.out.println("Duplication for " + seenTable.get(hash) + " with " + i + "-" + j); } else { seenTable.put(hash, i+"-"+j); } } } }
The code is being executed. Of course, this is not an exhaustive test, but the breakTest1 () function has no problems. The @gkuzmin function displays the following:
Is hash equal: true Conflicted values: { o},{o }
Why do two lines produce the same hash? Because they work effectively with the strings "11111arbitrary_string" in both cases. This is problem.