1) For really low hash collisions, can I get away just by using half 128 bits of sha1 and not having to deal with sha1 itself? I understand that this is not suitable for cryptographic hashes, but I just need hashes for the keys of the hash table.
2) Calculation time is not a priority, and in addition, I collect very small pieces of data. In particular, I'm basically going to take 2 or 3 64-bit hashes and hash them to get another 64-bit hash. Is there a better option than sha1 for this purpose? Again, collisions should be very unlikely.
3) I am sql newb. Is it a good idea to use 64 bit hashes as id in sql? Will a 64-bit id cause performance issues in sqlite or postgres? I would need to coordinate data across several databases (including the Lucene index), so I decided that I should process the hashes directly in the tables, and not worry about auto-incrementing identifiers (which would only make sense in one db, and not in all data stores ) I believe the 64-bit bit is a good compromise: big enough for unlikely collisions, but it saves space (and search time?).
4) What about the CRC-64? Does this get a random distribution?
hash sha1
Jegschemesch
source share