Hash function that generates short hashes?

Is there an encryption method that can take a string of any length and create a hash up to 10 characters long? I want to create a fairly unique identifier, but based on the contents of the message, and not randomly.

However, I can limit messages to integer values ​​if strings of arbitrary lengths are not possible. However, in this case, the hash should not be the same for two consecutive integers.

+78
uniqueidentifier encryption
Dec 30 '10 at 23:32
source share
9 answers

You can use any public hash algorithm (e.g. SHA-1) that will give you a slightly longer result than you need. Just crop the result to the desired length, which may be enough.

For example, in Python:

>>> import hashlib >>> hash = hashlib.sha1("my message".encode("UTF-8")).hexdigest() >>> hash '104ab42f1193c336aa2cf08a2c946d5c6fd0fcdb' >>> hash[:10] '104ab42f11' 
+66
Dec 30 '10 at 23:37
source share

If you don’t need an algorithm that is strongly against intentional modification, I found an algorithm called adler32 that produces fairly short (~ 8 characters). Select it from the drop-down list to try:

http://www.sha1-online.com/

+37
18 Oct '16 at 10:40
source share

You need hash content to find the digest. There are many hashes, but 10 characters are pretty small for a set of results. Upon returning, people used CRC-32, which produces a 33-bit hash (basically 4 characters plus one bit). There is also CRC-64, which creates a 65-bit hash. MD5, which creates a 128-bit hash (16 bytes / characters), is considered broken for cryptographic purposes, since two messages that have the same hash can be found. It goes without saying that anytime you create a 16-byte digest from an arbitrary length message, you will get duplicates. The shorter the digest, the greater the risk of a collision.

However, your concern that the hash will not be like two consecutive messages (whether integer or not) should be true with all hashes. Even a single change in the original message should result in a significantly different resulting digest.

So, using something like CRC-64 (and base-64'ing result), you should find you in the neighborhood you are looking for.

+10
Dec 30 '10 at 23:54
source share

I just summarize the answer that was useful to me (noting @erasmospunk's comment about using base-64 encoding). My goal was to have a short string that would be basically unique ...

I am not an expert, so please correct this if there are any obvious errors (in Python again, as an accepted answer):

 import base64 import hashlib import uuid unique_id = uuid.uuid4() # unique_id = UUID('8da617a7-0bd6-4cce-ae49-5d31f2a5a35f') hash = hashlib.sha1(str(unique_id).encode("UTF-8")) # hash.hexdigest() = '882efb0f24a03938e5898aa6b69df2038a2c3f0e' result = base64.b64encode(hash.digest()) # result = b'iC77DySgOTjliYqmtp3yA4osPw4=' 

result here uses more than just hexadecimal characters (what you would get if you used hash.hexdigest() ), so the likelihood of a collision is less likely (i.e. truncating is safer than the hexadecimal digest).

Note. Using UUID4 (optional). See http://en.wikipedia.org/wiki/Universally_unique_identifier for other types.

+7
Apr 19 '14 at 1:47
source share

You can use an existing hash algorithm that creates something short, like MD5 (128 bit) or SHA1 (160). You can then shorten it further with XORing digests in other sections. This will increase the likelihood of collisions, but not as bad as simply digest truncation.

In addition, you can include the length of the source data as part of the result to make it more unique. For example, XORing the first half of an MD5 digest with the second half will result in 64 bits. Add 32 bits for the data length (or lower if you know that the length will always fit in fewer bits). This will produce a 96-bit (12-byte) result so that you can turn into a 24-character sixth line. Alternatively, you can use base 64 encoding to make it even shorter.

+6
Dec 03 '16 at 3:14
source share

If you need a "sub-10-character hash" you can use the Fletcher-32 algorithm , which produces an 8-digit hash (32 bits), CRC-32 or Adler-32 .

CRC-32 is slower than Adler32, in 20% - 100% of cases.

Fletcher-32 is slightly more reliable than Adler-32. It has a lower computational cost than the Adler checksum: a comparison of Fletcher and Adler .

An example program with several Fletcher implementations is shown below:

  #include <stdio.h> #include <string.h> #include <stdint.h> // for uint32_t uint32_t fletcher32_1(const uint16_t *data, size_t len) { uint32_t c0, c1; unsigned int i; for (c0 = c1 = 0; len >= 360; len -= 360) { for (i = 0; i < 360; ++i) { c0 = c0 + *data++; c1 = c1 + c0; } c0 = c0 % 65535; c1 = c1 % 65535; } for (i = 0; i < len; ++i) { c0 = c0 + *data++; c1 = c1 + c0; } c0 = c0 % 65535; c1 = c1 % 65535; return (c1 << 16 | c0); } uint32_t fletcher32_2(const uint16_t *data, size_t l) { uint32_t sum1 = 0xffff, sum2 = 0xffff; while (l) { unsigned tlen = l > 359 ? 359 : l; l -= tlen; do { sum2 += sum1 += *data++; } while (--tlen); sum1 = (sum1 & 0xffff) + (sum1 >> 16); sum2 = (sum2 & 0xffff) + (sum2 >> 16); } /* Second reduction step to reduce sums to 16 bits */ sum1 = (sum1 & 0xffff) + (sum1 >> 16); sum2 = (sum2 & 0xffff) + (sum2 >> 16); return (sum2 << 16) | sum1; } int main() { char *str1 = "abcde"; char *str2 = "abcdef"; size_t len1 = (strlen(str1)+1) / 2; // '\0' will be used for padding size_t len2 = (strlen(str2)+1) / 2; // uint32_t f1 = fletcher32_1(str1, len1); uint32_t f2 = fletcher32_2(str1, len1); printf("%u %X \n", f1,f1); printf("%u %X \n\n", f2,f2); f1 = fletcher32_1(str2, len2); f2 = fletcher32_2(str2, len2); printf("%u %X \n",f1,f1); printf("%u %X \n",f2,f2); return 0; } 

Exit:

 4031760169 F04FC729 4031760169 F04FC729 1448095018 56502D2A 1448095018 56502D2A 

I agree with the test vectors :

 "abcde" -> 4031760169 (0xF04FC729) "abcdef" -> 1448095018 (0x56502D2A) 

Adler-32 has a weakness for short messages with several hundred bytes, because the checksums for these messages have weak coverage of 32 available bits. Check this:

The Adler32 algorithm is not sophisticated enough to compete with comparable checksums .

+3
Feb 17 '18 at 1:50
source share

You can use the hashids library, which has implementations for PHP, Javascript, Python, etc. See this link for more details.

+2
Oct 20 '14 at 7:08
source share

Just run this in the terminal (on MacOS or Linux):

 crc32 <(echo "some string") 

8 characters long.

+1
Mar 05 '19 at 4:32
source share

I recently needed something like a simple line-cutting function. Essentially, the code looked something like this (C / C ++ code in front):

 size_t ReduceString(char *Dest, size_t DestSize, const char *Src, size_t SrcSize, bool Normalize) { size_t x, x2 = 0, z = 0; memset(Dest, 0, DestSize); for (x = 0; x < SrcSize; x++) { Dest[x2] = (char)(((unsigned int)(unsigned char)Dest[x2]) * 37 + ((unsigned int)(unsigned char)Src[x])); x2++; if (x2 == DestSize - 1) { x2 = 0; z++; } } // Normalize the alphabet if it looped. if (z && Normalize) { unsigned char TempChr; y = (z > 1 ? DestSize - 1 : x2); for (x = 1; x < y; x++) { TempChr = ((unsigned char)Dest[x]) & 0x3F; if (TempChr < 10) TempChr += '0'; else if (TempChr < 36) TempChr = TempChr - 10 + 'A'; else if (TempChr < 62) TempChr = TempChr - 36 + 'a'; else if (TempChr == 62) TempChr = '_'; else TempChr = '-'; Dest[x] = (char)TempChr; } } return (SrcSize < DestSize ? SrcSize : DestSize); } 

It may have more collisions than we would like, but it is not intended to be used as a cryptographic hash function. You can try different factors (i.e. change 37 to another prime) if you get too many collisions. One of the interesting features of this fragment is that when Src is shorter than Dest, Dest ends with the input line as it is (0 * 37 + value = value). If you need something “readable” at the end of the process, Normalize will adjust the converted bytes by increasing collisions.

Source:

https://github.com/cubiclesoft/cross-platform-cpp/blob/master/sync/sync_util.cpp

0
May 26 '16 at 3:53
source share



All Articles