Fast bidirectional hash of two integers in C

I am writing a Linux kernel module, and I need to come up with a hash function that takes two integers for input. Since the code runs in kernel space, none of the standard libraries are available to me.

Basically, I need a hash function, where:

hash(a, b) = c hash(b, a) = c 

Where the acceptable inputs for a and b are unsigned 32-bit integers. The hash function should return a 64-bit unsigned integer. Collision (i.e. Hash (a, b) = c and hash (d, f) = c as well) is undesirable since these values ​​will be used in the binary search tree. The search result is a linked list of possible results, which is then repeated, where a and b are actually compared. Thus, some collision is acceptable, but the less collisions, the less iterations are required and the faster it will work.

Performance is also extremely important, this search will be used for every packet received on the system, as I am writing a firewall application (integers are, in fact, the source of the packet and the destination address). This function is used to search for existing network sessions.

Thank you for your time.

+7
source share
5 answers

Pseudocode how you can do this:

 if a>b return (a << 32) | b; else return (b << 32) | a; 

This satisfies the hash (a, b) == hash (b, a), uses the full 64-bit space and should not have collisions ... I think :)

Be careful not to shift 32-bit variables directly. Use intermediate 64-bit buffers or inline casts instead:

 uint64_t myhash(uint32_t a, uint32_t b) { uint64 a64 = (uint64_t) a; uint64 b64 = (uint64_t) b; return (a > b) ? ((a64 << 32) | b64) : ((b64 << 32) | a64); } 
+6
source
 ((a | b) << 32) + (a & b) 

is commutative and should result in a minimum number of collisions. I have to think about it though ...

+3
source
 #define MYHASH(a,b) ( (((UINT64) max(a,b)) << 32) | ((UINT64) min(a,b)) ) 
+3
source

How about ((uint64_t)max(a, b) << UINT64_C(32)) | (uint64_t)min(a, b)) ((uint64_t)max(a, b) << UINT64_C(32)) | (uint64_t)min(a, b)) ? This would avoid collisions completely, since there is no possible overlap between the entrances. I cannot speak with distribution, as it depends on your input values.

+1
source

(a ^ b) | ((a ^ ~ b) <32);

+1
source

All Articles