Matching hash function for hash random binary strings

I have two arrays: char data1 [length], where the length is a multiple of 8, i.e. length can be 8, 16.24 ... The array contains binary data read from a file that is opened in binary mode, I will continue to read from the file, and every time I read, I will store the read value in a hash table . The distribution of this binary data has a random distribution. I would like to hash each array and store them in a hash table in order to be able to search for char with specific data again. Which would be a good hash function to achieve this. Thanks

Please note that I am writing this in C ++ and c, so any language you decide to provide a solution in will be great.

+5
source share
2 answers

If the data you are reading is 8 bytes long and is really randomly allocated, and your hash code should be 32 bits, what about this:

uint32_t hashcode(const unsigned char *data) {
  uint32_t hash = 0;
  hash ^= get_uint32_le(data + 0);
  hash ^= get_uint32_le(data + 4);
  return hash;
}

uint32_t get_uint32_le(const unsigned char *data) {
  uint32_t value = 0;
  value |= data[0] << 0;
  value |= data[1] << 8;
  value |= data[2] << 16;
  value |= data[3] << 24;
  return value;
}

If you need a higher speed, this code can probably be much faster if you can guarantee that it is dataalways correctly aligned to be interpreted as const uint32_t *.

+3
source

I have successfully used MurmurHash3 in one of my projects.

Pros:

  • Quickly. Very fast.
  • It supposedly has a low collision speed.

Minuses:

  • Not suitable for cryptographic applications.
  • It is not standardized in any form or form.
  • , x86. , , , - Java, .

, . -...

+2

All Articles