Theory and concept

This is not a question related to any programming language. Say you have a file written on a large machine, and you know that. If two single-byte values ​​were written back to back, how would you know? Big-endian reverses the order of 16, 32, and 64 bits, since you know you need to read it as separate bytes?

For example, you write byte 0x11, then byte 0x22. Then the file contains 0x1122. If you read this on a small destination machine, you will have to convert it. So would you read it as 2211, or 1122? Do you know how?

It makes sense? I feel like I'm missing something superbase here.

+2
source share
8 answers

Unable to find out. This is why formally defined file formats usually specify an entity, or they provide an option (as in the case of unicode, as mentioned by MSN). Thus, if you are reading a file with a specific format, you already know this big-endian, because the fact that it implies specific content in this format.

Another good example of this is the network byte order - network protocols are usually large, so if you are a little endian processor talking to the Internet, you have to write things back. If you are generous, you need not worry about it. People use features like htonl and ntohl to replicate what they write on the network so that their source code is the same on all machines. These functions are defined to do nothing on large-end machines, but they invert bytes on small-end machines.

The key is that endianness is a property of how concrete architectures represent words. It is not a mandate that they should write files in a certain way; it just tells you that architecture instructions expect multibyte words to have their bytes ordered in a certain way. A large machine can write the same sequence of bytes, like a machine with a small order, it just can use a few more instructions for it, because it must change the order of bytes. The same can be said of low-rise machines that write large formats.

+6
source

You need to either guess this because you know something else (i.e. you know you are reading the file in large end format), or you need to somehow encode the suffix in the file. Unicode text files use 0xFFFE (or something similar) as the first two bytes of a text file to calculate endianness. If you read it as 0xfffe, then it is in the original endian format. If you read it as 0xfeff, it is not.

+2
source

You are absolutely right ... without any information about the data you are looking at, there is no way to find out.

As the saying goes, there are often ways to guess ... if you know that you should see the text, you can run some simple tests to see if it makes sense what you get ... if you can read the headline, you can often be afraid of it ... but if you just look at the stream of bytes, there is no sure way to find out.

+1
source

It makes sense?

Yes: this is a problem.

I feel like I'm missing something super basic here.

Basically, in order to read a file (especially a binary file), you need to know the file format: which includes knowing whether a pair of bytes is a sequence of individual bytes or is a single double-byte word.

+1
source

You are missing nothing. Well-defined binary file formats (for example, Excel 97-2003 xls workbooks, for example) should include endianness as part of the specification, or you will obviously have big problems.

Historically, the Macintosh used Motorola processors (68000 and its successors), which were widescreen, while IBM PC / DOS / Windows computers always used Intel processors, which were of little use. Therefore, C / C ++ codebase software vendors that work on both platforms are very familiar with this problem, while software vendors who always developed Windows or Mac software before Apple switched to Intel could simply ignore it - at least for their own file formats.

+1
source

Not sure if this is exactly what you are asking for, but, for example, the PCAP file format indicates the endianness variable:

http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html

The concept is that you can write a marker byte, for example 0x12345678, in the header of your file. On a "large endian" machine, such as PowerPC, it will be written as follows:

0x12 0x34 0x56 0x78

On a "small destination" machine, such as x86, it will be written as follows:

0x78 0x56 0x34 0x12

Then, when you read your header, you can find out which machine you read to determine if you need to change the bytes when reading the file. Or you can specify a continent, for example, a large endian. Then you will always change the bytes on the small destination machine.

In the case of the PCAP format, this was done for performance reasons. But it is probably easier to point and confirm and stick to it.

+1
source

The processor runs in one or the other endian mode (some may switch based on pages, etc.). They do not know whether they are doing the right thing or not. They just do what they do. (Garbage In, Garbage Out): -)

0
source

There is no way to detect, I would say. But in C #, BitConverter has an IsLittleEndian-propertytie.

It all depends on how you want to understand it.

More details here .

0
source

All Articles