Double alignment

After discussing this post, I realized that the main reason for aligning structure members is performance (and some architecture limitations).

If, when compiling for 32-bit x86, we will examine Microsoft (Visual C ++), Borland / CodeGear (C ++ - Builder), Digital Mars (DMC) and GNU (GCC): alignment for int is 4 bytes and if int not aligned, it may happen that 2 rows of memory banks are read.

My question is, why not make double same as 4 bytes? A aligned 4 byte double will also cause two rows of memory banks to be read ....

For example, in the following example, since double is 8-aligned, the actual size of the structure will be sizeof(char) + (alignment for double padding) + sizeof(int) = 20 bytes .

 typedef struct structc_tag{ char c; double d; int s; } structc_t; 

thank

+8
c ++ c memory-alignment structure
Jun 19 '12 at 19:50
source share
4 answers

Extended comment:

According to GCC documentation about -malign-double :

Aligning double variables at the border of two words creates code that runs somewhat faster on Pentium due to the larger amount of memory.

On x86-64, -malign-double enabled by default.

Warning: if you use the -malign-double switch, structures containing the above types are aligned compared to published specifications of the binary interface of the application for 386 and are not compatible with binary structures in code compiled without this switch.

The word here means the word i386, which is 32 bits.

Windows uses 64-bit alignment of double values ​​even in 32-bit mode, while SysV i386 ABI-compatible Unices use 32-bit alignment. The 32-bit Windows API, Win32, ships with Windows NT 3.1, which, unlike the current-generation versions of Windows, focuses on Intel i386, Alpha, MIPS, and even the obscure Intel i860. Since native RISC systems such as Alpha and MIPS require double values ​​to be consistent with 64-bit (otherwise a hardware error) portability could be justification for 64-bit alignment in ABI Win32 i386.

64-bit x86 systems, also known as AMD64 or x86-64 or x64, require double values ​​to be consistent with 64-bit ones, otherwise a misalignment error occurs, and hardware provides an expensive “fix” that slows access to the memory. Therefore, double values ​​correspond to 64-bit values ​​in all modern x86-64 ABIs (SysV and Win32).

+9
Jun 19 '12 at 22:15
source share

Most compilers automatically align data values ​​with the platform word size or data size, whichever is smaller. The vast majority of consumer and corporate processors use 32-bit word size. (Even 64-bit systems usually use 32 bits as their native word size)

Thus, ordering the members in your structure can lead to some memory. In your particular case, you are fine. I will add in the comments the actual trace of the used memory:

 typedef struct structc_tag{ char c; // 1 byte // 3 bytes (padding) double d; // 8 bytes int s; // 4 bytes } structc_t; // total: 16 bytes 

This rule also applies to structures, so even if you rearrange them so that the smallest field is the last, you will still have a structure of the same size (16 bytes).

 typedef struct structc_tag{ double d; // 8 bytes int s; // 4 bytes char c; // 1 byte // 3 bytes (padding) } structc_t; // total: 16 bytes 

If you were to declare more fields less than 4 bytes in size, you can see some memory reductions if you grouped them by size. For example:

 typedef struct structc_tag{ double d1; // 8 bytes double d2; // 8 bytes double d3; // 8 bytes int s1; // 4 bytes int s2; // 4 bytes int s3; // 4 bytes short s4; // 2 bytes short s5; // 2 bytes short s6; // 2 bytes char c1; // 1 byte char c2; // 1 byte char c3; // 1 byte // 3 bytes (padding) } structc_t; // total: 48 bytes 

Declaring a stupid structure can waste a lot of memory if the compiler does not reorder your elements (which, in general, will not, without an explicit explanation)

 typedef struct structc_tag{ int s1; // 4 bytes char c1; // 1 byte // 3 bytes (padding) int s2; // 4 bytes char c2; // 1 byte // 3 bytes (padding) int s3; // 4 bytes char c3; // 1 byte // 3 bytes (padding) } structc_t; // total: 24 bytes // (9 bytes wasted, or 38%) // (optimal size: 16 bytes (1 byte wasted)) 

Parties double more than 32 bits and, accordingly, according to the rule in the first section, 32 bits are aligned. Someone mentioned a compiler option that changes alignment, and that the default compiler option is different between 32 and 64-bit systems, this is also true. So the real answer about doubles is that it depends on the platform and the compiler.

Memory performance is determined by the words: loading from memory occurs in stages, which depends on the placement of data. If the data covers one word (i.e. word alignment), only that word needs to be loaded. If it is not aligned correctly (i.e., Int is at 0x2), the processor must load 2 words in order to correctly read its value. The same applies to doubles, which usually take up to 2 words, but if they are offset, take 3. On 64-bit systems, where 64-bit quantities can be loaded, they behave like 32-bit ints on 32-bit systems if they are correctly aligned, they can be extracted with one load, but otherwise they will require 2.

+6
Jun 19 '12 at 20:16
source share

First of all, it is an architecture that imposes an alignment requirement, and some will tolerate unadjusted memory accesses, others will not.

As an example, let's take the windows x86-32bit platform; on this platform, the alignment requirement for int and long will be 4 bytes and 8 bytes respectively.

It is clear why the int requirement to align 4 bytes just so that the processor can read everything with just one access.

The reason the alignment requirement for doulbe is 8 bytes rather than 4 bytes is because if it was 4 bytes , then think about what would happen if this double is located at address 60 and the cache line size was 64bits , in this case, the processor needs to load 2 cache lines from memory into the cache, but if this double was aligned, this will not happen, since in this case double will always be part of one cache line and are not split between two.

  ...58 59|60 61 62 63 64 65 66 67|68 69 70 71... - - - - - - - - - - - - - - - - - ----------+ + + + . . + + + +-------------- | . . | ----------+ + + + . . + + + +-------------- . . Cache Line 1 . . Cache Line 2 - - - - - - - - - - - - - - - - - 
+2
Mar 30 '14 at 16:42
source share

The question is very platform specific in terms of processor architecture. For example, in architectures that give a fine for working with addresses that are not oriented to 4 bytes, changing your variables (infact their addresses) to 4 bytes can avoid such a fine.

Compilers are pretty good with such things, especially when you supply them with a targeted processor architecture that needs to be optimized so that they can do most of this for you and many other optimizations. Take a look at the GCC -march flag, for example, which allows you to configure processor architectures.

0
Jun 19 '12 at 19:57
source share



All Articles