Bit Striping in C

I have 3 buffers containing data bits R, G, B running on a 32-bit processor.

I need to combine three bytes as follows:

R[0] = 0b r1r2r3r4r5r6r7r8 G[0] = 0b g1g2g3g4g5g6g7g8 B[0] = 0b b1b2b3b4b5b6b7b8 int32_t Out = 0b r1g1b1r2g2b2r3g3 b3r4g4b4r5g5b5r6 g6b6r7g7b7r8g8b8 xxxxxxxx 

where xxxxxxxx extends to each of the following bytes in the buffers.

I am looking for the best way to combine them. My approach is definitely ineffective.

Here is my approach

 static void rgbcombineline(uint8_t line) { uint32_t i, bit; uint8_t bitMask, rByte, gByte, bByte; uint32_t ByteExp, rgbByte; uint8_t *strPtr = (uint8_t*)&ByteExp; for (i = 0; i < (LCDpixelsCol / 8); i++) { rByte = rDispbuff[line][i]; gByte = gDispbuff[line][i]; bByte = bDispbuff[line][i]; bitMask = 0b00000001; ByteExp = 0; for(bit = 0; bit < 8; bit++) { rgbByte = 0; rgbByte |= ((rByte & bitMask) >> bit) << 2; rgbByte |= ((gByte & bitMask) >> bit) << 1; rgbByte |= ((bByte & bitMask) >> bit); ByteExp |= (rgbByte << 3*bit); bitMask <<= 1; } TempLinebuff[((i*3)+0) +2] = *(strPtr + 2); TempLinebuff[((i*3)+1) +2] = *(strPtr + 1); TempLinebuff[((i*3)+2) +2] = *(strPtr + 0); } } 
+6
source share
4 answers

If you can reserve 1024 bytes, you can achieve the desired result with a single lookup table of 256 elements:

 uint32_t lookup[256] = { 0, 1, 8, 9, 64, 65, ... /* map abcdefgh to a00b00c00d00e00f00g00h */ }; uint32_t result = (lookup[rByte] << 2) | (lookup[gByte] << 1) | lookup[bByte]; 

It uses only 3 searches, 2 shifts and 2 or operations, which should provide acceptable acceleration.

If you have more space, you can use three lookup tables to eliminate shifts (although this can lead to poor cache performance, so always check your profile!)

+6
source

You can use multiplication by a "magic" constant to replicate bits. Then use bit shifts to extract the necessary bits and bitwise masking to combine them. A β€œmagic” constant is a 17-bit binary file 10000000100000001. When multiplied by it, any 8-bit number is combined 3 times.

  r1r2r3r4r5r6r7r8 * M = r1 r2r3r4r5r6r7r8r1r2r3r4 r5 r6r7r8r1r2r3r4r5r6r7r8
 r1r2r3r4r5r6r7r8 * M shr 2 = 0 0 r1 r2 r3r4r5r6r7r8r1r2r3r4r5 r6 r7r8r1r2r3r4r5r6
 r1r2r3r4r5r6r7r8 * M shr 4 = 0 0 0 0 r1r2 r3 r4r5r6r7r8r1r2r3r4r5r6 r7 r8r1r2r3r4
 r1r2r3r4r5r6r7r8 * M shr 6 = 0 0 0 0 0 0 r1r2r3 r4 r5r6r7r8r1r2r3r4r5r6r7 r8 r1r2

Signs in bold indicate those in the right places.

If you use this disguise code

 R * M & 0b100000000000100000000000 | (R * M >> 2) & 0b000100000000000100000000 | (R * M >> 4) & 0b000000100000000000100000 | (R * M >> 6) & 0b000000000100000000000100 

you will get red bits combined correctly:

 r1 0 0 r2 0 0 r3 0 0 r4 0 0 r5 0 0 r6 0 0 r7 0 0 r8 0 0 

Then combine the blue and green bits in the same way.


Rough estimate of the number of operations:

  • Multiplications: 3
  • Bit Shifts: 9
  • Bitwise And: 12
  • Bitwise OR: 11
+3
source

You can use a 64-bit table that contains 6-bit bits, and then extract 2 bits each from r, g, and b and use the table for faster searches. Using a search size of 512 or 4096 may be more efficient.

 /* Converts bits abcdefghijkl to adgjbehkcfil */ static const uint32_t bitStripLookUp[4096] = { /* Hard coded values, can be generate with some script */ ... }; ... rByte = rDispbuff[line][i]; // rByte, gByte, bByte should be unit32 gByte = gDispbuff[line][i]; bByte = bDispbuff[line][i]; uMSB = ((rByte << 4) & 0x0F00) | (gByte & 0x00F0) | ((bByte >> 4) & 0x000F); // r7r6r5r4g7g6g5g4b7b6b5b4 uLSB = ((rByte << 8) & 0x0F00) | ((gByte << 4) & 0x00F0) | (bByte & 0x000F); // r3r2r1r0g3g2g1g0b3b2b1b0 stuffed_value = (bitStripLookUp[uMSB] << 12) | bitStripLookUp[uLSB]; 
+2
source

alternating with bitwise operators

 inline int interleave(int n) { n = ((n << 18) | (n << 9) | n) & 0007007007; // 000000111 000000111 000000111 n = ((n << 6) | (n << 3) | n) & 0444444444; // 100100100 100100100 100100100 return n; } r = interleave(r); g = interleave(g); b = interleave(b); rgb = r | (g >> 1) | (b >> 2); TempLinebuff[((i*3)+0) +2] = (rgb >> 16) & 0xFF; TempLinebuff[((i*3)+1) +2] = (rgb >> 8) & 0xFF; TempLinebuff[((i*3)+2) +2] = rgb & 0xFF; 

Another way -

striping using magic number multiplication using this packaging technique

Suppose rByte has 8 bits with the number 12345678 . After the stripes in the final result, these R bits will look like this, hyphens are not preambular bits.

 1--2--3--4--5--6--7--8------------------------------------------ 

We will distribute a bit of 8 bytes evenly

 unsigned long long r = (rByte * 0x0101010101010101ULL) * 0x8040201008040201ULL; 

Now r contains bits in rByte as

 1--------2--------3--------4--------5--------6--------7--------8 

hyphens all zeros


Explanation

  ........................................................12345678 (rByte) x ..............1......1......1......1......1......1......1......1 (Magic number, dots are 0s) __________________________________________________________________ ........................................................12345678 ................................................12345678.......↓ ........................................12345678......↓........↓ ................................12345678.....↓........↓........↓ + ........................12345678....↓........↓........↓........↓ ................12345678...↓........↓........↓........↓........↓ ........12345678..↓........↓........↓........↓........↓........↓ 12345678.↓........↓........↓........↓........↓........↓........↓ __________________________________________________________________ = 1........2........3........4........5........6........7........8 

To move a bit in r to their final positions, we divide r into 2 parts and get a bit in each part at their correct positions. The first part will move bits 1, 4, 5 and 8 with a magic number 0x40001040001 , and the second part will move the remaining bits with a magic number 0x01040001040 . These magic numbers can be calculated in the same way as above. Perhaps 32-bit multiplication is enough for this, but I have not tested it.

 #define RBIT(n) (1ULL << (8-n)*9) #define RMASK_1458 (RBIT(1) | RBIT(4) | RBIT(5) | RBIT(8)) #define RMASK_2367 (RBIT(2) | RBIT(3) | RBIT(6) | RBIT(7)) #define BIT(n) ((1ULL << 63) >> ((n-1)*3)) #define MASK_BIT1458 (BIT(1) | BIT(4) | BIT(5) | BIT(8)) #define MASK_BIT2367 (BIT(2) | BIT(3) | BIT(6) | BIT(7)) #define MAGIC_1458 0x40001040001ULL #define MAGIC_2367 0x01040001040ULL uint64_t resultR = (((r & RMASK_1458) * MAGIC_1458) & MASK_BIT1458) | (((r & RMASK_2367) * MAGIC_2367) & MASK_BIT2367); 

The bits for G and B can be computed in the same way. After that, you can simply combine the results together.

 result = (resultR >> 32) | (resultG >> 33) | (resultB >> 34); 
0
source

All Articles