Fast distance between two bits

I am writing software that relies heavily on (1) single-bit access and (2) calculating the Hamming distance between 2 bits A and B (i.e. the number of bits that differ between A and B). The bits are big enough, between 10K and 1M bits, and I have a bunch of them. Since it is not possible to know the bit sizes at compile time, I use vector < bool >, but I plan to switch to soon boost::dynamic_bitset.

Next will be my questions:

(1) Any ideas on which implementations have the fastest one-time access time?

(2) To calculate the Hamming distance, the naive approach is to iterate over the individual bits and calculate the difference between the two bits. But I feel that it can be much faster to iterate over bytes instead of bits, do R = byteA XOR byteB and look at a table with 255 entries that the "local" distance is related to R. Other solutions would be to save 255 x 255 and access directly without an operation to the distance between bytes A and bytes. So my question is: any idea how to implement this from std::vector < bool >or boost :: dynamic_bitset? In other words, do you know if there is a way to access an array of bytes or do I need to transcode everything from scratch?

+5
source share
3 answers

(1) , vector<char> ( vector<int>), 7/8 . , . vector<bool> dynamic_bitset , . ++.

(2) boost::dynamic_bitset operator^ a count, , , , . to_block_range; , OutputIterator.

+3

, , , , : . XOR , popcount, - popcount, ( 256 ).

[Edit: , boost::dynamic_bitset::to_block_range, Block int long. , OutputIterator, , InputIterator - , , int . , , . , , , , , , operator^ count().]

+2

I know that this will be reduced for heresy, but here it is: you can get a pointer to the actual data from the vector using & vector [0]; (for vector ymmv). Then you can iterate over it with c-style functions; meaning, draw a pointer to an int pointer or something similar, perform hamming arithmetic as described above, and move the pointer one word length at a time. This will only work because you know that the bits are packed together continuously and will be vulnerable (for example, if the vector is modified, it can move memory cells).

0
source

All Articles