Efficient byte search algorithm in a bit matrix

Question

Efficient byte search algorithm in a bit matrix

If bytearray uint8_t data[N] is specified, then what is the efficient way to find the uint8_t search byte in it , even if the search not octet aligned ? that is, the first three bits of search can be in data[i] and the next 5 bits in data[i+1] .

My current method involves creating a function bool get_bit(const uint8_t* src, struct internal_state* state) ( struct internal_state contains a mask that fails to the right, & ed with src and returns, supporting size_t src_index < size_t src_len ), shifting the left the returned bits in uint8_t my_register and comparing it with search each time and using state->src_index and state->src_mask to get the position of the matching byte.

Is there a better way to do this?

+8

c algorithm search

user80551 May 11, '15 at 18:48

source share

5 answers

Lukas Thomsen · Answer 1 · 2015-05-11T19:50:53+0000

If you are looking for an eight-bit pattern in a large array, you can implement a sliding window over 16-bit values to check if the pattern found is part of the two bytes that form this 16-bit value.

To be portable, you have to take care of the content issues that are implemented by my implementation by creating a 16-bit value to manually search for the template. The high byte is always the current iteration byte, and the low byte is the next byte. If you do a simple conversion, for example value = *((unsigned short *)pData) , you will have problems with x86 processors ...

After value , cmp and mask set, cmp and mask shifted. If the pattern was not found in the upper byte, the loop continues, checking the next byte as the start byte.

Here is my implementation, including some debugging printouts (the function returns the bit position or -1 if the pattern is not found):

 int findPattern(unsigned char *data, int size, unsigned char pattern) { int result = -1; unsigned char *pData; unsigned char *pEnd; unsigned short value; unsigned short mask; unsigned short cmp; int tmpResult; if ((data != NULL) && (size > 0)) { pData = data; pEnd = data + size; while ((pData < pEnd) && (result == -1)) { printf("\n\npData = {%02x, %02x, ...};\n", pData[0], pData[1]); if ((pData + 1) < pEnd) /* still at least two bytes to check? */ { tmpResult = (int)(pData - data) * 8; /* calculate bit offset according to current byte */ /* avoid endianness troubles by "manually" building value! */ value = *pData << 8; pData++; value += *pData; /* create a sliding window to check if search patter is within value */ cmp = pattern << 8; mask = 0xFF00; while (mask > 0x00FF) /* the low byte is checked within next iteration! */ { printf("cmp = %04x, mask = %04x, tmpResult = %d\n", cmp, mask, tmpResult); if ((value & mask) == cmp) { result = tmpResult; break; } tmpResult++; /* count bits! */ mask >>= 1; cmp >>= 1; } } else { /* only one chance left if there is only one byte left to check! */ if (*pData == pattern) { result = (int)(pData - data) * 8; } pData++; } } } return (result); }

nekavally · Answer 2 · 2015-05-11T19:28:34+0000

I don’t know, it would be better, but I would use a sliding window.

 uint counter = 0, feeder = 8; uint window = data[0]; while (search ^ (window & 0xff)){ window >>= 1; feeder--; if (feeder < 8){ counter++; if (counter >= data.length) { feeder = 0; break; } window |= data[counter] << feeder; feeder += 8; } } //Returns index of first bit of first sequence occurrence or -1 if sequence is not found return (feeder > 0) ? (counter+1)*8-feeder : -1;

Also, with some changes, you can use this method to search for a sequence of bits of arbitrary length (from 1 to 64-array_element_size_in_bits).

John bollinger · Answer 3 · 2015-05-11T19:41:18+0000

I do not think you can do much better than this in C:

 /* * Searches for the 8-bit pattern represented by 'needle' in the bit array * represented by 'haystack'. * * Returns the index *in bits* of the first appearance of 'needle', or * -1 if 'needle' is not found. */ int search(uint8_t needle, int num_bytes, uint8_t haystack[num_bytes]) { if (num_bytes > 0) { uint16_t window = haystack[0]; if (window == needle) return 0; for (int i = 1; i < num_bytes; i += 1) { window = window << 8 + haystack[i]; /* Candidate for unrolling: */ for (int j = 7; j >= 0; j -= 1) { if ((window >> j) & 0xff == needle) { return 8 * i - j; } } } } return -1; }

The basic idea is to handle 87.5% of cases crossing the boundary between consecutive bytes by combining bytes in a wider data type ( uint16_t in this case). You can tweak it to use an even wider data type, but I'm not sure if something will benefit.

What you cannot safely or easily do is something related to the cast part or your entire array with a wider integer type using a pointer (ie (uint16_t *)&haystack[i] ). You cannot be sure of the correct alignment for such an act or byte order by which the result can be interpreted.

harold · Answer 4 · 2015-05-11T22:00:16+0000

If AVX2 is acceptable (with earlier versions it didn’t work out so well, but you can still do something there), you can search in many places at once. I could not test this on my machine (only compile), so the following gives you an idea of how you can approach it than copy & paste code, so I will try to explain this, and not just dump the code.

The main idea is to read uint64_t , shift it to the right by all the values that make sense (from 0 to 7), then for each of these 8 new uint64_t , check to see if there is a byte in it. A slight complication: for uint64_t shifted by more than 0, the top position should not be taken into account, since it has zeros shifted into it, which may not be in the actual data. Once this is done, the next uint64_t should be read at offset 7 from the current one, otherwise there will be a border that is not checked. These subtle, albeit low loads are not so bad, especially if they are small.

So, now for some (unverified and incomplete, see below) code,

 __m256i needle = _mm256_set1_epi8(find); size_t i; for (i = 0; i < n - 6; i += 7) { // unaligned load here, but that OK uint64_t d = *(uint64_t*)(data + i); __m256i x = _mm256_set1_epi64x(d); __m256i low = _mm256_srlv_epi64(x, _mm256_set_epi64x(3, 2, 1, 0)); __m256i high = _mm256_srlv_epi64(x, _mm256_set_epi64x(7, 6, 5, 4)); low = _mm256_cmpeq_epi8(low, needle); high = _mm256_cmpeq_epi8(high, needle); // in the qword right-shifted by 0, all positions are valid // otherwise, the top position corresponds to an incomplete byte uint32_t lowmask = 0x7f7f7fffu & _mm256_movemask_epi8(low); uint32_t highmask = 0x7f7f7f7fu & _mm256_movemask_epi8(high); uint64_t mask = lowmask | ((uint64_t)highmask << 32); if (mask) { int bitindex = __builtin_ffsl(mask); // the bit-index and byte-index are swapped return 8 * (i + (bitindex & 7)) + (bitindex >> 3); } }

The funny “bit-index and byte-index change” - this is because the search in qword is performed by byte by bytes, and the results of these comparisons end with 8 adjacent bits, while the search is “shifted by 1”, ends in the next 8 bits and so on . Thus, in the resulting masks, the index of the byte that contains 1 is a bit offset, but the bit index in this byte is actually a byte offset, for example, 0x8000 will correspond to a byte search in the 7th byte qword shifted to the right by 1, therefore, the actual index is 8 * 7 + 1.

There is also a tail problem, part of the data remaining after processing all 7-byte blocks. This can be done the same way, but now more positions contain false bytes. Now there are n - i bytes left, so the low byte should have the n - i bit, and for all other bytes - less. For the same reason as the previous ones, zeros are shifted in other positions. In addition, if there is only 1 byte “left”, this is not entirely true, because it would have already been tested, but it does not really matter. I assume that the data is sufficiently augmented that accessing outside does not matter. Here it is not verified:

 if (i < n - 1) { // make ni-1 bits, then copy them to every byte uint32_t validh = ((1u << (n - i - 1)) - 1) * 0x01010101; // the lowest position has an extra valid bit, set lowest zero uint32_t validl = (validh + 1) | validh; uint64_t d = *(uint64_t*)(data + i); __m256i x = _mm256_set1_epi64x(d); __m256i low = _mm256_srlv_epi64(x, _mm256_set_epi64x(3, 2, 1, 0)); __m256i high = _mm256_srlv_epi64(x, _mm256_set_epi64x(7, 6, 5, 4)); low = _mm256_cmpeq_epi8(low, needle); high = _mm256_cmpeq_epi8(high, needle); uint32_t lowmask = validl & _mm256_movemask_epi8(low); uint32_t highmask = validh & _mm256_movemask_epi8(high); uint64_t mask = lowmask | ((uint64_t)highmask << 32); if (mask) { int bitindex = __builtin_ffsl(mask); return 8 * (i + (bitindex & 7)) + (bitindex >> 3); } }

samgak · Answer 5 · 2015-05-12T05:19:03+0000

If you are looking for a large amount of memory and can afford an expensive installation, another approach is to use a 64K lookup table. For each possible 16-bit value, a byte is stored in the table containing the offset of the bit offset at which the octet coincides (+1, so 0 may indicate no match). You can initialize it as follows:

 uint8_t* g_pLookupTable = malloc(65536); void initLUT(uint8_t octet) { memset(g_pLookupTable, 0, 65536); // zero out for(int i = 0; i < 65536; i++) { for(int j = 7; j >= 0; j--) { if(((i >> j) & 255) == octet) { g_pLookupTable[i] = j + 1; break; } } } }

Please note that the case when the value is shifted does not turn on 8 bits (the reason will be obvious in a minute).

Then you can scan through your byte array as follows:

  int findByteMatch(uint8_t* pArray, uint8_t octet, int length) { if(length >= 0) { uint16_t index = (uint16_t)pArray[0]; if(index == octet) return 0; for(int bit, i = 1; i < length; i++) { index = (index << 8) | pArray[i]; if(bit = g_pLookupTable[index]) return (i * 8) - (bit - 1); } } return -1; }

Further optimization:

Read 32 or at least a few bits at a time from pArray to uint32_t, and then shift and And each of them to get a byte one at a time, OR with an index and a test, before reading another 4.
Pack LUT on 32K, keeping nybble for each index. This may help to cache it on some systems.

This will depend on your memory architecture, whether it will be faster than a deployed loop that does not use a lookup table.

Efficient byte search algorithm in a bit matrix

More articles: