C ++: union against bitwise operators

I have two char , and I want to "stitch" them together bit by bit.
For instance:

 char c1 = 11; // 0000 1011 char c2 = 5; // 0000 0101 short int si = stitch(c1, c2); // 0000 1011 0000 0101 

So, I first tried with bitwise operators:

 short int stitch(char c1, char c2) { return (c1 << 8) | c2; } 

But this does not work: I get short equal to c2 ... (1) Why?
(But: c1 and c2 are negative numbers in my real application ... maybe this is part of the problem?)

So my second solution was to use union :

 union stUnion { struct { char c1; char c2; } short int si; } short int stitch(char c1, char c2) { stUnion u; u.c1 = c1; u.c2 = c2; return u.si; } 

It works as I want ... I think

(2) What is the best / fastest way?

Thanks!

+4
source share
5 answers

The union method is at best determined by the implementation (in practice, it will work quite reliably, but the si format depends on the nature of the platform).

The problem with the bitwise way, as you suspect, is related to negative numbers. A negative number is represented by a chain of leading 1. So, -5, for example,

 1111 1011 

If you discard this value to int or even unsigned int , it becomes

 1111 1111 1111 … 1111 1011 

and all these 1 will drown the left-shifted data when applying OR.

To solve the problem, move char to unsigned char and then to int (to prevent overflow or even overflow) before the transfer:

 short int stitch(char c1, char c2) { return ( (int) (unsigned char) c1 << 8) | (unsigned char) c2; } 

or, if you can change the types of arguments, and you can include <cstdint> ,

 uint16_t stitch( uint8_t c1, uint8_t c2) { return ( (int) c1 << 8 ) | c2; } 
+7
source

$ 5.8 / 1 states- "The operands must be an integer or enumerated type, and integral advancements are performed. The result type is the result of the advanced left operand. The behavior is undefined if the right operand is negative, or reater, than or equal to the bit length of the advanced left operand.

So, try using ct1 to map c1 to an unsigned int, and then bitwise OR with C2. Also return the output as unsigned int. characters rise to int, but we want to be "unsigned int"

+3
source

The reason is that c2 first advances to int before performing a bitwise OR, which leads to character expansion (it is assumed that char is signed and may contain negative values):

 char x1 = -2; // 1111 1110 char x2 = -3; // 1111 1101 short int si = stitch(c1, c2); // 1111 1111 1111 1101 

The x2 representation pushed to int is (at least) 1 byte filled with 1 , so it overwrites the zero bit x1 , which you previously shifted up. You can specify an unsigned char . With two additional representations that will not change the bitpatter in the low byte. Although this is not strictly necessary, you can use c1 for unsigned char too, for consistency (if short is 2 bytes long, it doesn't matter that c1 was familiar above these two bytes)

 short int stitch(char c1, char c2) { return ((unsigned char)c1 << 8) | (unsigned char)c2; } 
+2
source

A shift / or method, after a fixed one, is cleaner since it does not depend on the byte order.

In addition, the merge method is probably slower on many modern processors due to a boot-overload (STLF) problem. You write the value to memory and then read it as different data types. Many CPUs cannot quickly send data to load if this happens. The load must wait until the storage is completely completed (deleted), writing its data to the L1 cache.

On very old processors without a barrel-shift (shifting by 8 requires 8 operations) and with simple execution in order, for example 68000, the merging method can be faster.

+1
source

You must not use union for this. You should never use union fields at the same time. If the union has a member A and a member B, then you must consider that A and B are not connected. This is because the compiler is free to add padding anywhere (with the exception of structure). Another problem is the byte order (small / large endian).

// EDIT There is an exception to the above “join rule”, you can use these members at the same time, which are on the front and have the same layout. those.

 union { struct { char c; int i; short s; } A; struct { char c; int i; char c1; char c2; } B; }; 

Ac and Ai can be used simultaneously with Bc and Bi

-1
source

All Articles