Copy an array of 4 element characters to an integer in C

Question

Copy an array of 4 element characters to an integer in C

A char is 1 byte, and an integer is 4 bytes. I want to copy a byte from char [4] to an integer. I thought of different methods, but I have different answers.

char str[4]="abc"; unsigned int a = *(unsigned int*)str; unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3]; unsigned int c; memcpy(&c, str, 4); printf("%u %u %u\n", a, b, c);

Output 6513249 1633837824 6513249

Which one is correct? What is going wrong?

+8

c

avmohan Oct 11 '13 at 17:19

source share

6 answers

None of the first two are true.

The first violates the alias rules and may fail because the str address is incorrectly aligned for unsigned int . To reinterpret the bytes of a string as an unsigned int with the byte order of the host system, you can copy it with memcpy :

 unsigned int a; memcpy(&a, &str, sizeof a);

(Assume that the size of unsigned int and the size of str same.)

The second one may end up with an integer overflow because str[0] advances to int , so str[0]<<24 is of type int , but the value required by the shift may be greater than what is represented in int . To fix this, use :

 unsigned int b = (unsigned int) str[0] << 24 | …;

This second method interprets the bytes from str in big-endian order, regardless of the byte order in the unsigned int on the host system.

+5

Eric Postpischil Oct 11 '13 at 17:30

source share

 unsigned int a = *(unsigned int*)str;

This initialization is incorrect and causes undefined behavior. It violates C anti-aliasing rules, which could potentially disrupt processor alignment.

+1

ouah Oct 11 '13 at 17:25

source share

Both are correct in a sense:

Your first solution copies in its own byte order (i.e. the byte order used by the CPU) and, therefore, may produce different results depending on the type of CPU.
The second solution copies in bytes of a large byte (i.e. the most significant byte at a low address) regardless of what the processor uses. This will give the same value for all types of processors.

What is correct depends on how to interpret the source data (array from char).
For example. Java code (class files) always uses a large byte order of bytes (regardless of what the processor uses). Therefore, if you want to read int from a Java class file, you must use the second method. In other cases, you can use a processor-dependent method (I think Matlab writes int in its own byte order to files, cf this question ).

+1

Curd Oct 11 '13 at 17:25

source share

You said you want to copy bytes bytes.

This means that the string unsigned int a = *(unsigned int*)str; not allowed. However, what you do is a fairly common way to read an array as another type (for example, when you read a stream from disk.

It just needs to be configured:

  char * str ="abc"; int i; unsigned a; char * c = (char * )&a; for(i = 0; i < sizeof(unsigned); i++){ c[i] = str[i]; } printf("%d\n", a);

Remember that the data you are reading may not coincide with the same specification as the machine you are reading from. This can help:

 void changeEndian32(void * data) { uint8_t * cp = (uint8_t *) data; union { uint32_t word; uint8_t bytes[4]; }temp; temp.bytes[0] = cp[3]; temp.bytes[1] = cp[2]; temp.bytes[2] = cp[1]; temp.bytes[3] = cp[0]; *((uint32_t *)data) = temp.word; }

+1

ldrumm Oct 11 '13 at 17:28

source share

If you use the CVI compiler (National Instruments), you can use the Scan function to do this:

unsigned int a;

For a large character: Scan (string, "% 1i [b4uzi1o3210]>% i", & a);

For the little endian: Scan (string, "% 1i [b4uzi1o0123]>% i", & a);

The o modifier specifies the byte order. The i inside the square brackets indicates where to start in the str array.

0

lupy87 Dec 16 '17 at 23:51

source share

Jon · Accepted Answer · 2013-10-11T17:25:33+0000

This is an endianness problem. When you interpret char* as int* , the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86, which is a little finite), while when manually converting the first byte becomes the most significant.

To put this in images, this is the original array:

  abc \0 +------+------+------+------+ | 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory +------+------+------+------+

When these bytes are interpreted as an integer in a small final architecture, the result is 0x00636261 , which is the decimal number 6513249. On the other hand, placing each byte manually gives 0x61626300 - decimal 1633837824.

Of course, the relation char* as int* is undefined, so the difference in practice is not important, because in fact you are not allowed to use the first conversion. However, there is a way to achieve the same result, which is called type ping :

 union { char str[4]; unsigned int ui; } u; strcpy(u.str, "abc"); printf("%u\n", u.ui);

Copy an array of 4 element characters to an integer in C

More articles: