The difference between the order of the canoe of the large final and small Endian Byte

What is the difference between the Big Endian and Little Endian byte order?

Both of them seem to be related to Unicode and UTF16. Where exactly do we use it?

+59
unicode endianness utf-16
Mar 31 '09 at 15:37
source share
6 answers

Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent a 0x1234 character as a string of bytes (0x00-0xFF):

 Byte Index: 0 1 --------------------- Big-Endian: 12 34 Little-Endian: 34 12 

To determine if the text uses UTF-16BE or UTF-16LE, the specification recommends adding a Byte Order (BOM) representing the U + FEFF character to the line. So, if the first two bytes of a text file are UTF-16 encoded: FE , FF , UTF-16BE encoding. For FF , FE is UTF-16LE.

Visual example: the word "example" in different encodings (UTF-16 with specification):

 Byte Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ------------------------------------------------------------ ASCII: 45 78 61 6d 70 6c 65 UTF-16BE: FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65 UTF-16LE: FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00 

For more information, please read the Endianness and / or UTF-16 page.

+112
Mar 31 '09 at 15:41
source share

Ferdinand's answer (and others) is correct, but incomplete.

Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32. They existed before Unicode and influenced how bytes of numbers are stored in computer memory. They are processor dependent.

If you have a number with the value 0x12345678 , then in memory it will be presented as 12 34 56 78 (BE) or 78 56 34 12 (LE).

UTF-16 and UTF-32 are represented by 2 bytes, respectively 4, so the byte order corresponds to the order on which any digit on this platform is.

+29
Jul 24. '09 at 8:30
source share

UTF-16 encodes Unicode into 16-bit values. Most modern file systems run on 8-bit bytes. So, to save, for example, a file encoded in UTF-16 to disk, you must decide which part of the 16-bit value will be transmitted in the first byte and which will go into the second byte.

Wikipedia provides a more complete explanation.

+7
Mar 31 '09 at 15:44
source share

little-endian: adj.

Describes a computer architecture in which bytes with lower addresses have a lower meaning within a given 16- or 32-bit word (the word is stored "lower end"). Families of PDP-11 and VAX computers and Intel microprocessors, as well as many communication and network devices are of little use. This term is sometimes used to describe the order of units other than bytes; most often bits in bytes.

big-endian: adj.

[general; From Swift Gulliver Travels through the famous article “On Holy Wars” and “A Request for Peace” by Danny Cohen, USC / ISI IEN 137 of April 1, 1980]

Describes a computer architecture in which, in a given multibyte numeric representation, the highest byte has the lowest address (the word is stored "big-end-first"). Most processors, including the IBM 370 family, PDP-10, the Motorola microprocessor family, and most of the various RISC designs, are very enthusiastic. The buy-byte order is also sometimes called the network order.

--- from the Jargon file: http://catb.org/~esr/jargon/html/index.html

+4
May 04 '10 at 3:37 p.m.
source share

Big-endian and little-endian are terms that describe the order in which a sequence of bytes is stored in computer memory.

  1. Big-endian is the order in which the "big end" (the most significant value in the sequence) is stored first (at the lowest storage address).
  2. Little-endian is the order in which the low end (the least significant value in the sequence) is stored first.

For example,

On a computer with direct byte order, the two bytes needed for the hexadecimal number 4F52 will be stored as 4F52 in storage (if 4F is stored at storage address 1000, for example, 52 will be at address 1001).

In a system with direct byte order, it will be stored as 524F (52 at address 1000, 4F at 1001).

+2
Feb 18 '15 at 6:01
source share

For Unicode / UTF-16 encoding, it is necessary to specify a byte entity (large or small), because for character codes that use more than one byte, there is a choice: to read or write the most significant byte for the first or last time. Unicode / UTF-16, since they are variable-length encodings (i.e., each char can be represented by one or more bytes) requires this to be specified. (Note, however, that the words “UTF-8” always have a length of 8 bits / one byte (although characters can be several dots), so there is no content problem.) If the byte stream encoder representing Unicode text and the decoder does not agree By what convention is used, the wrong character code can be interpreted. For this reason, either the endianness convention is known in advance, or most often the byte order of bytes is usually indicated at the beginning of any Unicode text file / stream to indicate that a large or small sequence number is used.

+1
Mar 31 '09 at 15:45
source share



All Articles