I need to assign a random but unique identifier for each row of the mysql table. The identifier must be the same if the string contains the same value

I need to assign a random but unique identifier for each row in a Mysql table. The identifier must be the same if the string contains the same values.

ie. If the first line contains [hi, hello, bye], the second line contains [gg, hello, bye], and the third line contains [hi, hello, bye], then the 1st and 3rd lines should generate the same identifiers and 2nd line must have different identifiers.

Thanks in advance.

+4
source share
4 answers
SELECT CRC32(CONCAT(column1, column2, column3)) FROM MyTable. 

Technically, CRC32 is not random (but what is it?) - and it has a small chance of generating collisions (comparing different values ​​with the same integer). But this is the beginning.

+1
source

MD5 hash can work. Below is chopped and fast / dirty code that needs updating but proves the concept.

 System.out.println("row1=" + test1 + ":" + tst1.getHash(test1)); System.out.println("row2=" + test2 + ":" + tst1.getHash(test2)); System.out.println("row3=" + test3 + ":" + tst1.getHash(test3)); private String getHash(String inputStr){ try{ MessageDigest md = MessageDigest.getInstance("MD5"); md.update(inputStr.getBytes()); byte byteData[] = md.digest(); StringBuffer sb = new StringBuffer(); for (int i = 0; i < byteData.length; i++) { sb.append(Integer.toString((byteData[i] & 0xff) + 0x100, 16).substring(1)); } return sb.toString(); } catch(Exception e) { e.printStackTrace(); return null; } } row1=hi,hello,bye:cfe40e96aa052a484208c2aefb6f39bb row2=gg,hello,bye:f652785f0e214507e6aea44ecd3ffb7a row3=hi,hello,bye:cfe40e96aa052a484208c2aefb6f39bb 
+3
source

If you really need proof that you are not getting collisions, it all comes down to the concatenation of all fields, and the separator is not contained in the fields. Of course, this will normally work very long and cumbersome.

What everyone usually does is: serve this line to the Hash function. Although theoretically this is not unique, given a suitable hash function with a sufficiently large result, it should be able to find one that is unlikely to cause collisions during the life of the human race. For example, git uses such a hash (sha1), and Linus Torvalds writes about an accidental collision :

First of all, let me remind people that an unintentional view of a collision is actually really really unlikely, so we are very likely to never see it in the entire history of the universe.

Another thing is not an accidental encounter. First of all, you need to make sure that the row you start with is not the same for different columns. It means:

  • Make sure all columns are contained
  • Make sure the columns are separated by something that is not contained in the columns themselves. Use shielding if necessary. For example, if you just concatenate two columns, the values ​​"abc" + "def" will give you the same result as "a" + "bcdef"

If you need to worry about targeted attacks, that is, someone is actually trying to create entries with the same hash, it is best to use a cryptographic hash, possibly used to hash passwords, which are often designed to be slow, to prevent brute force attacks. Of course, this may run into the requirement that most applications be as fast as possible.

0
source

What you need is a hash function of all the values ​​that interest you in the string. This cannot be random, because by definition it should be deterministic - with the same values, you always get the same identifier. If "random" means "not sequential", most hash functions should satisfy this need.

Theoretically, you cannot guarantee uniqueness, since there is always a chance of collisions. That is, different identifiers definitely mean that the values ​​of the strings are different, but the converse is not always true. Depending on your needs, you can implement an explicit mapping of the actual string values ​​when matching identifiers are encountered. You can also consider using a cryptographic hash function such as MD5 or SHA1, and rely on the probabilities that are on your side (in fact, any collision that you find with the cryptographic hash function will be some kind of breakthrough in the field).
0
source

All Articles