The hamming distance is a metric; therefore, it satisfies the triangle inequality. For each bit string in your database, you can save its distance for hamming to some predetermined constant bit string. Then you can use triangle inequality to filter bitstrings in the database.
So let's say
C <- some constant bitstring S <- bitstring you're trying to find the best match for B <- a bitstring in the database distS <- hamming_dist(S,C) distB <- hamming_dist(B,C)
So, for each B you save it with the corresponding distB .
The lower bound for hamming(B,S) will then be abs(distB-distS) . The upper bound will be distB+distS .
You can drop all B so that the lower bound is higher than the lower upper bound.
I am not 100% sure about the best way to choose C I think you would like it to be a bit string close to the "center" of your metric space bitstron.
source share