You need to be careful when using hash functions in common programming languages. It was customary to introduce randomized seeds into hash functions, so the hash values ββare unique to only one program run. This avoids the denial of service attacks noted in oCert advisory 2011-3 . (As an advisory, this problem was described in 2003 in a paper submitted by Usenix.)
For example, the Python hash function was randomized by default since v3.3:
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc -2595772619214671013 $ python3 -c 'from sys import argv;print(hash(argv[1]))' abc -6001956461950650533 $ python3 -c 'from sys import argv;print(hash(argv[1]))' abc -7414807274805087300 $ python3 -c 'from sys import argv;print(hash(argv[1]))' abc -327608370992723225
You can control the hash randomization in Python by setting the environment variable PYTHONHASHSEED .
Or you can use a standardized cryptographic hash such as SHA-1. The commonly available sha1sum utility prints the result in hexadecimal format, but you can convert it to decimal with bash (truncated to 64 bits):
$ echo $((0x$(sha1sum <<<"string to hash")0)) -7037254581539467098
or in its full 160-bit glory using bc (which requires hex to be uppercase):
$ bc <<<ibase=16\;$(sha1sum <<<"string to hash"|tr az AZ)0 861191872165666513280590001082621748432296579238
If you only need a hash value modulo some power 16, you can use the first few bytes of the sum SHA-1. (You can use any choice of bytes - they are all equally well distributed, but the first ones are somewhat easier to extract):
$ echo $((0x$(sha1sum <<<"string to hash"|cut -c1-2))) 150
Note. . As @gniourf_gniourf notes in a comment, the above does not really calculate the SHA-1 checksum of a given string, because the syntax bash here-string ( <<<word ) adds a new line to the word . Since the checksum of a line with a new line added is as good a hash as the checksum of the line itself, the problem does not arise if you always use the same mechanism to create the hash.
source share