Based on: Searching for location-sensitive hashes I would say this by reading the Similarity Score Rounding algorithm methods :
This question is somehow broad, so I'll just give a minimal (abstract) example here:
We have 6 (= n ) vectors in our data set with d bits each. Suppose we are doing 2 (= n ) random permutations.
Let the first random permutation begin! Remember that we rearrange the bits, not the order of the vectors . After permuting the bits, they maintain order, for example:
v1 v5 v0 v3 v2 v4
Now the query query q appears, but it is (almost) unlikely to be the same with the vector in our dataset (after permutation), so we will not find it by performing a binary search.
However, we will end up between two vectors. So, now we can imagine the scenario to be like this (for example, q lies between v0 and v3:
v1 v5 v0 <-- up pointer <-- q lies here v3 <-- down pointer v2 v4
Now we move the pointer up or down, looking for the vector vi, which will correspond to the largest bits with q . Let them say that it is v0.
Similarly, we perform the second permutation and find the vector vi, say, v4. now we compare v0 with the first permutation and v4 to see which one is closest to q , i.e. which one has the most bits equal to q .
However, if you are looking for a ready-made implementation, you should ask for a Software Recommendation . I would also look at the article I contacted to find out if the author made the code publicly available or would like to share it after I contacted them.
gsamaras
source share