Is C Endian Neutral?

Question

Is C Endian Neutral?

Is C neutral towards the end? Ok, another way to ask this question. I am currently translating a lot of code from C to Matlab on the same platform (PC). Do I need to take care of Enterianism? Both are date neutral, but C (not so sure), Matlab (pretty sure). In the same way, I also translated C into Python. Therefore, my question is, is there anyone in my experience (translating from C to another neutral-neutral language) met an unexpected problem with a large / small entity. Obviously, we are only talking about the base language. In this case, I mentioned C99.

-6

c endianness translate

dfm Feb 12 '16 at 20:36

source share

1 answer

Nominal Animal · Answer 1 · 2016-02-21 08:30

First, some background and explanation:

As I mentioned in the comment on the original question, the byte order is often confused with the bit order. Endianness refers only to byte order. The order of bits makes sense only in the documentation and when sending data through some serial connection.

In arithmetic, in the base B (and 2 ≤ B ∈ ℕ), i'th digit D _i has the value D _i B ⁱ . The least significant integral digit corresponds to i = 0, that is, D ₀ . For binary, B = 2. For ordinary decimal numbers, which most people prefer, B = 10.

(This works for all reals, not just integers. The most significant fractional digit, the first digit on the other side of the decimal point, is D _-1 , with a more negative i indicating less significant digits.)

Since the “bit” is the binary binary “portant”, we thus have a natural way of marking bits, with bit 0 being the least significant (integer) bit (corresponding to value 1), bit 1 is the next by value (corresponding to value 2) etc.

In some hardware documents that use the byte order of bytes, he insists on labeling the most significant bit in the word as “bit 0” (with bit numbers increasing left to right, unlike most numeric representations, where numbers become more significant from right to left). This is simply a marking convention, as this convention does not comply with arithmetic rules. In fact, you need to know the width (number of bits) in this word in order to even calculate the actual numerical value of such "bit 0" s.

Is C Neutral Neutral?

Yes, C (as in ISO C89, C99 and C11) is neutral with respect to byte order. Standards do not specify byte order; it depends on the implementation. In practice, the compiler chooses the byte order appropriate for the target architecture at compile time.

Theoretically, integers and floating point can have different byte order.

POSIX.1 adds network support for C. Certain fields in network-related structures are defined in network bytes, in most cases, mostly bytes. POSIX.1 provides the functions htons() , htonl() , ntohs() and ntohl() byteorder to convert from host to network byte order and vice versa.

In addition to the byte order of the network (often called big-endian), the byte order of the least significant byte (the least significant byte) is also very common, for example, on Intel / AMD architectures. The PDP-endian byte order (where four byte values are stored first by the second byte value, followed by the most significant byte, followed by the least significant, followed by the second-least significant byte) is currently rare.

Finally, C was implemented on a large number of architectures with byte orders covering all three of the above, without any problems with the byte order. This should be practical evidence.

I am currently translating a lot of code from C to Matlab [or Python] on the same platform (PC). Do I need to take care of Enterianism?

No, I see no reason for you to care about content when porting code between C, Matlab, Python, or almost any high-level language.

However:

A language that is neutral with respect to the end does not mean that you do not need to care about the content in your programs. The value of the data byte order . It comes down to how your program transfers - reading and writing - data; be through memory structures (using shared memory or between different programming languages through library bindings), to / from files, through network connections, or through channels from / to other programs.

If your programs transmit data in some text format, then you only need to worry about this format and, possibly, the character set used - I prefer UTF-8 (see utf8everywhere.org .

If your programs transmit data in binary format, you should understand that binary multibyte values always have a specific byte order. This can be byte order (or big-endian), little-endian, or native byte for the current architecture. Just because your programming language is neutral with respect to the final value does not mean that you should ignore the storage byte order.

For example, Matlab and Octave fread() support the fifth parameter, which determines the byte order used: native , ieee-be (IEEE big-endian) or ieee-le (IEEE little-endian). The Python module package is struct and unpacked by default into its own byte order and C alignment (addition), but you can use < or > as the first character in the format string to indicate little-endian or big-endian / network byte order- endian without filling.

Very often, C code stores a binary in its own byte order. However, some C codes do not. I prefer to keep the bytes in my own order, but also store the known prototype values for each different base numeric type, so that readers can trivially detect if they need to rearrange the byte order in order to correctly interpret the code. There are also various libraries and formats, such as NetCDF, that can be used to create portable binary data files.

The most important thing is to understand what C code does in the first place.

I don’t understand why someone would want to transfer the code from C to Matlab or Python, unless the C code was really poor - in this case I would just rewrite the logic and not transfer the existing code.

Did you encounter an unexpected problem with a large / small entity?

No, never transferring code between high-level languages.

Yes, when saving / retrieving binary data between different systems.

Although it is not content related to multidimensional data, it is important to remember that Fortran and Matlab (and OpenGL matrices) use the column order (each column is sequential in memory), and C uses the row order (each row is sequential in memory) .

Is C Endian Neutral?

More articles: