A simplified version of a good method from Tabei et al., Single or multiple sorting in all pair search similarities , say for pairs with Hammingdist 1:
- sort all bit strings on the first 32 bits
- look at the blocks of lines where the first 32 bits are the same; these blocks will be relatively small
- pdist each of these blocks for Hammingdist (left 32) 0 + Hammingdist (rest) <= 1.
This skips the beat, for example. 32/128 nearby couples who have a Hammingdist (left 32) 1 + Hammingdist (rest) 0. If you really want this, repeat the above using "first 32" → "last 32".
.
, , Hammingdist <= 2 4 32- ; ,
2000 0200 0020 0002 1100 1010 1001 0110 0101 0011,
2 0, .
(Btw, sketchsort-0.0.7.tar - 99% src/boost/, build/,.svn/.)