This problem can be solved by writing a simple task "Map" and "Zoom out". I am not saying that this is the best solution, and I am not saying that this is the only solution.
In addition, you revealed in the comments that k is in hundreds, there are millions of bit strings and the size of each of them is 512 or 1024.
Pseudocode:
- Given Q;
- For each bit string b, calculate the similarity = b and Q
- Emit (similarity, b)
Now the combiner can consolidate the list of all bits of the Strings from each mapping that have the same affinity.
Gearbox Pseudo Code:
- Consumption (affinity, listOfBitStringsWithThisSimilarity);
- Print them in descending order for the similarity value.
You can extract vertex k-bits from the gearbox output.
So, the MapReduce paradigm is probably the classic solution you're looking for.
displayName
source share