Fast calculation of hamming distance in C

I read the Wikipedia article on Hamming 's weight and noticed something interesting:

Thus, it is equivalent to Hamming distance from the entire zero line of the same length . For the most typical case, a string of bits is the number 1 in a string. In this binary case, it is also called popcount or side counting .

[emphasis mine]

So something happened to me. Can I calculate the Hamming distance between two lines on XOR ing , and then take the Hamming weight (POPCOUNT) of the resulting row?

Something along the lines of this (using gcc intrinsics):

 #include <stdint.h> int hammingDistance (uint64_t x, uint64_t y) { uint64_t res = x ^ y; return __builtin_popcountll (res); } 

Now, as to why I would like to do this, well, on some platforms, yes, it will just translate to gcc , calling the function that evaluates popcount . For example, on x64 without popcnt , gcc spits out ( Godbolt GCC Online ):

 hammingDistance: sub rsp, 8 xor rdi, rsi call __popcountdi2 add rsp, 8 ret 

OTOH, if you have a platform that supports POPCOUNT, for example, x64 models, including nehalem and after (which have popcnt ), you get ( Godbolt GCC Online ):

 hammingDistance: xor rdi, rsi popcnt rax, rdi ret 

which should be reset faster, especially once nested.


But back to the original question. Can you take the Hamming XOR weight of two lines to find their Hamming distance? i.e:

 HD = HW (x xor y) 
+8
c gcc intrinsics hamming distance
source share
2 answers

Hamming distance between two lines of equal length, x and y , is defined as the number of positions in which they differ. In the case of x and y , which are bistre strings, x^y is a string with 1 exactly in the positions that they differ. So HammingDistance(x,y) = Number of 1s in x^y , for a bitstron. In addition, HammingWeight(x) = number of 1s in x for the bit string x . So your first statement HammingDistance(x,y) = HammingWeight(x^y) is true for bit strings. By setting this, it is clear that your implementation is correct.

+5
source share

Yes it works. For each bit, a bit is 1 if and only if the input bits are different. Therefore, applied to an integer bit vector, the result has as many bits (HW) as the inputs have different bits (HD). And your code seems to be taking advantage of this relationship. In fact, this shortcut is even mentioned later in the Hamming weighted article you refer to ( Effective Implementation ):

The Hamming distance of the two words A and B can be calculated as the Hamming weight A xor B.

+3
source share

All Articles