Why are DWORD values ​​usually represented in hexadecimal?

I am trying to understand why the DWORD value is often described in hexadecimal format on MSDN.

The reason I am analyzing this is because I am trying to understand fundamentally why all these different data types exist. A local mentor hinted to me that creating DWORD and other types of Microsoft has something to do with the evolution of processors. This provides a meaning and context for my understanding of these data types. I would like more context and background.

In any case, I could use some explanation or some resources on how to remember the difference between DWORD, unsigned integers, bytes, bits, WORD, etc.

So my questions are: 1) Why are DWORDs presented in Hex? 2) Can you provide resources on the differences between numerical data types and why were they created?

+7
source share
5 answers

Everything inside the computer is a group of 0 and 1 sec. But writing the entire DWORD in a binary is quite tedious:

00000000 11111111 00000000 11111111 

to save space and improve readability, we would like to write it in shorter form. Decimal is what we are most familiar with, but it doesn't compare well with binary. Octal and Hexadecimal are quite convenient, lining up exactly in binary digits:

 // each octal digit is exactly 3 binary digits 01 010 100 binary = 124 octal // each hexadecimal digit is exactly 4 binary digits 0101 0100 binary = 54 hexadecimal 

Since hexadecimal strings combine very well with 8-bit bytes (2 hexadecimal digits form bytes), the notation is stuck and what is used most. It's easier to read, easier to understand, easier to line up when messing with bitmaps.

A normal transcript to determine which base is being used:

  1234543 = decimal 01234543 = octal (leading zero) 0x1234543 = hexadecimal (starts with 0x) 

Regarding your question about BYTE, WORD, DWORD, etc.

Computers started with a bit. Only 1 or 0. He had a cameo in the original Throne.

Bytes are 8 bits long (well, once there were 7-bit bytes, but we can ignore them). This allows you to have a number from 0 to 255 or a subscriber number from -128 to 127. Better than only 1/0, but still limited. You may have heard links to "8-bit games." This is what we are talking about. The system was built around bytes.

Then computers grew to 16-bit registers. These are 2 bytes and became known as the WORD (no, I do not know why). Now the numbers can be 0-65535 or -32768 to 32767.

We continued to consume more power, and computers were expanded to 32-bit registers. 4 bytes, 2 words, also known as DWORD (double word). To this day, you can look in "C: \ Windows" and look at the directory for the "system" (old 16-bit fragments) and "system32" (new 32-bit components).

Then came QWORD (a four-digit word). 4 WORDS, 8 bytes, 64 bits. Have you ever heard of the Nintendo-64? This is where the name came from. Modern architecture is now here. The internal parts of the processor contain 64-bit registers. You can usually run a 32- or 64-bit operating system on such a processor.

It covers bit, byte, word, word. These are raw types and are often used for flags, bitmasks, etc. If you want to keep the actual number, it is best to use a signed / unsigned integer, long, etc.

I did not cover floating point numbers, but hopefully this helps in the general idea.

+9
source

DWORD constants are usually written in hexadecimal when they are used as flags, which can be OR'd together in size. It makes it easier. This is why you see 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, etc. Programmers simply recognize these values ​​as having binary representations with only one set of bits.

When this enumeration, you will see 0x01, 0x02, 0x03, etc. They are often still written in hexadecimal because programmers tend to break into these habits!

+4
source

For write-only, 16-bit unsigned data is called the WORD beacause at the moment, computers had 16 bit registers.

In the history of the computer, 8 bits of data, where the largest data can be stored in the register. Since it could store the ascii character, it was usually called CHAR.

But a 16-bit computer came out, and the CHAR did not match the name of the 16-bit data. Thus, 16-bit data was usually called WORD, because it was the largest unit of data that you could store in one register, and was a good analogy to the continuation of the work done for CHAR.

So, on some computers, using another CPU WORD usually refers to the size of the register. On a Saturn CPU using a 64-bit register, WORD is 64 bits.

When the x86 32-bit processors came out, WORD was left 16 bits for compatibility reasons, and DWORD was created to expand to 32 bits. The same is true for QWORD and 64 bit.

As to why hexadecimal is commonly used to describe a WORD, it must be consistent with the nature of the WORD definition attached to it in register origin. In assembler programming, you use hexadecimal to describe the data, because processors only know binary integers (0 and 1). And hexadecimal is a more compact way to use binary files and preserve some of its properties.

+1
source

To clarify Tim’s answer, this is because converting Hex to binary and vice versa is very simple: each hexadecimal digit consists of 4 binary digits:

 0x1 = 0001 0x2 = 0010 ... 0xD = 1101 0xE = 1110 0xF = 1111 

So 0x2D = 0010 1101

0
source

You have a very interesting and difficult question.

In short, there were two drivers that led to the existence of competing type families - based on DWORD and based on int:

1) The desire to have crosses on the one hand and restrained types of sizes on the other hand.

2) Conservatism of peoples.

In any case, in order to give a detailed answer to your question and a reasonably good background for this field, we must delve into the history of computers. And start our story from the first days of computing.

Firstly, there is such a thing as a machine word. A machine word is a fragment of binary data that is natural for processing in a particular processor. Thus, the size of a machine word is hardly dependent on the processor and is generally equal to the size of the internal registers of the internal processor. Usually it can be divided into two equal parts, which can also be accessed by the processor as independent data. For example, on x86 processors, the size of a machine word is 32 bits. This means that all common registers (eax, ebx, ecx, edx, esi, edi, ebp, esp and eip) have the same size - 32 bits. But many of them may be available as part of the register. For example, you can access eax as a 32-bit data block, ax as a 16-bit data block or even an 8-bit data block. But this is not physical, it is all a 32-bit register. I think you can find a very good background in this area on Wikipedia (http://en.wikipedia.org/wiki/Word_(computer_architecture)). In short, a machine word is how much a bit of a piece of data can be used as an integer operand for a single instruction. Even today, different processor architectures have different machine word sizes.

Well, we have some understanding of a computer word. It is time to return to the history of computing. The first Intel x86 processors that were popular were 16 bits in size. He appeared on the market in 1978. Assembler was very popular at that time, if it was not the main programming language. As you know, assembler is a very thin shell under its own processor language. Because of this, he is sensitive to equipment. And when Intel pushes the new 8086 processor into the market, the first thing they needed to succeed was to direct the new processor to the market for the new processor. No one wants a processor that no one knows how to program. And when Intel gave names for different types of data in assembler for 8086, they make chois obvious and name the 16-bit piece of data as a word because the 8086 machine word is 16 bits in size. Half the machine word was called a byte (8 bits), and two words used as one operand were called a double word (32-bit). Intel used these terms in processor manuals and in assembler mnemonics (db, dw nd dd for static distribution of byte, word and double word).

Years passed and 1985 Intel moved from a 16-bit architecture to a 32-bit architecture with the introduction of the 80386 processor. But at that time there were a huge number of developers, accustomed to this word, this is a 16-bit value. In addition, a huge amount of soft text was written with the true belief that the word is 16-bit. And many of the code already written relies on the fact that the word is 16 bits. Because of this, in addition to the fact that the size of the machine word was actually changed, the notation remained the same, except for the fact that the new data type arrived in assembler - a four-digit word (64-bit), because the instruction, which relies on two the word machines remained the same, but the word machine was expanded. In the same way, a dual quad-core word (128-bit) with the 64-bit AMD64 architecture appeared. As a result, we have

 byte = 8 bit word = 16 bit dword = 32 bit qword = 64 bit dqword = 128 bit 

Please note that the main thing in this type is a family of strictly dimensional types. Because it comes from and is used in assembler, which requires constant-sized data types. Note that years pass one on one, but the data types from this family continue to have the same constant size, in addition to the fact that its name no longer has its original meaning.

On the other hand, at the same time, from year to year, high-level languages ​​became more and more popular. And since this languges was designed with cross-platform application in mind, he looked at the sizes of his internal data types from a completely different perspective. If I understand correctly that no high-level language explicitly states that some of its internal data types have a fixed constant size that will never be changed in the future. Let's look at C ++ as an example. The C ++ standard reports that:

 "The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementa- tion-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address." 

So, we can see amazing information - in C ++ even a byte does not have a constant size. So even if we are used to thinking that the size is 8 bits, in accordance with C ++ there can be not only 8, but also 9, 10, 11, 12, etc. At the rate of. Or maybe even 7 bits.

 "There are five signed integer types: "signed char", "short int", "int", and "long int"., and "long long int". In this list, each type provides at least as much storage as those preceding it in the list. Plain ints have the natural size suggested by the architecture of the execution environment; the other signed integer types are provided to meet special needs." 

This series describes two main complaints:

1) sizeof (char) <= sizeof (short) <= sizeof (int) <= sizeof (long) <= sizeof (long long)

2) Plain ints have the natural size suggested by the runtime architecture. This means that int must be the machine word size of the target processor architecture.

You can go through all the standard text in C ++, but you won't find something like "size of int is 4 bytes" or "length of long is 64 bit". The size of individual C ++ integer types can change when moving from one processor architecture to another and moving from one compiler to another. But even when you write a program in C ++, you will periodically be faced with the requirement to use data types with a known constant size.

At least earlier, compiler developers followed these standard requirements. But now we see that conservatism comes into play again. People are used to thinking that int is 32-bit and can store values ​​in the range from -2,147,483,648 to 2,147,483,647. Previously, when the industry went the line between 16-bit and 32-bit architecture. The second requirement was strictly observed. And when you used the C ++ compiler to create a 16-bit program, the compiler used an int with 16-bit size, which is the “natural size” for 16-bit processors, and vice versa, when you used another C ++ compiler to create A 32-bit program, but from the same source code, the compiler used an int with a 32-bit size, which is the "natural size" for 32-bit processors. Currently, if you look at the Microsoft C ++ compiler, for example, you will find that it will use a 32-bit int regardless of the target processor architecture (32-bit or 64-bit) just because people are used to thinking that int is 32-bit!

As a summary, we can see that thare are two types of data types - based on dword and int. The motivation for the second is obvious - cross-platform application development. The motivation for each of them is all cases when the perception of the values ​​of variables makes sense. For example, among others, the following cases can be noted:

one). You must have some value in a predetermined range, and you need to use its class or other data structure, which will be populated into a huge number of instances at runtime. In this case, if you use int-based types to store this value, it will have a lack of huge amounts of memory on some architectures and could potentially break the logic on another. For example, you need to manipulate values ​​in the range from 0 to 1,000,000. If you use int to save it, the program will behave correctly if the int is 32-bit, it will have 4-byte memory overhead for each instance of the value, if int is 64-bit and will not work correctly if int is 16-bit.

2) Data involved in the next work. In order to be able to correctly process your network protocol on different PCs, you will need to specify it in the usual format based on the size that will describe all the packets and the header in parts. Your network connection will be completely broken if on one PC your protocol header is 20 bytes long with 32-bit, and on the other PC - 28-byte length with 64-bit int.

3) Your program must store the value used for some special processor instructions, or your program will communicate with modules or code fragments written in assembler.

4) You need the storage values ​​that will be used to communicate with devices. Each device has its own specific parameter, which describes which input device requires input, and in what form it will provide output. If a device requires a 16-bit value as an input signal, it should receive the same 16-bit value regardless of the size of the int and even regardless of the size of the machine word used by the processor in the system where the device is installed.

5) Your algorithm is based on integer overflow logic. For example, you have an array of 2 ^ 16 records, and you want to go through it briefly and sequentially and update the values ​​of the records. If you use a 16-bit int, your program will work fine, but wimmediatelly you go to 32-bit using int, you will have access to the range range index.

In this regard, Microsoft uses both families of data types. Types based on Int, if the actual size of the data does not matter much, and DWORD - in those cases when it has. And even in this case, Microsoft defines both macros and the ability to quickly and easily implement the virtual type system used by Microsoft for a specific processor and / or compiler architecture, assigning it the correct C ++ equivalent.

I hope that I considered the issue of the origin of data types and their differences quite well.

So, we can move on to the question of why a six-digit number is used to indicate DWORD-based data type values. There are actually several reasons:

1) If we use binary data types with a hard size, this will be expected enough so that we can look at them in binary form.

2) It is very easy to understand the meaning of bit masks when they are encoded in binary form. Agree that it is much easier to understand which bit is set and which bit is reset if the value is in the following form

 1100010001011001 

then if it will be encoded in the following form

 50265 

3) Data encoded in binary form and described by a single value based on DWORD, have a constant length when the same data encoded in decimal form will have a variable length. Please note that even when a small number is encoded in binary form, a complete description of the meaning is provided.

 0x00000100 

instead

 0x100 

This property of binary coding is very attractive in the case when the analysis of a huge amount of binary data is required. For example, a hex editor or simple memory analysis used by your program in the debugger when you break a breakpoint. Agree that it’s much more convenient to look at neat columns of values, which are a bunch of weakly aligned variable size values.

So, we decided that we want to use binary coding. We have three options: use regular binary encoding, use octal encoding and use hexadecimal encoding. Peple prefers to use six-dimensional encoding, because it is the shortest of the many encodings available. Just compare

 10010001101000101011001111000 

and

 0x1234568 

Can you quickly find the number of bits that is set in the next value?

 00000000100000000000000000000 

and next?

 0x00100000 

In the second case, you can quickly divide the number in four divided bytes

 0x00 0x10 0x00 0x00 3 2 1 0 

in each of which the first digit denotes the 4 most significant bits, and the second denotes another 4 least significant bits. After you spend some time working with hexadecimal values, you will remember a simple bit-analogue of each hexadecimal digit and replace each other in the other without any problems:

 0 - 0000 4 - 0100 8 - 1000 C - 1100 1 - 0001 5 - 0101 9 - 1001 D - 1101 2 - 0010 6 - 0110 A - 1010 E - 1110 3 - 0011 7 - 0111 B - 1011 F - 1111 

So, we only need a second or two to find that we have bit number 20 set!

People use hexadecimal code because it is the shortest, convenient for immediate use and uses a binary data encoding form.

0
source

All Articles