Consider the following program:
It generates the following assembly (the comments added by me, I'm a complete newbie for the assembly):
$ vim stack-alignment.c $ gcc -c -S -O3 stack-alignment.c $ cat stack-alignment.s .file "stack-alignment.c" .section .rdata,"dr" LC0: .ascii "%c|%i|%c\0" .text .p2align 2,,3 .globl _stack_alignment .def _stack_alignment; .scl 2; .type 32; .endef _stack_alignment: LFB7: .cfi_startproc subl $44, %esp .cfi_def_cfa_offset 48 movb $45, 26(%esp) // local variable 'a' movl $1337, 28(%esp) // local variable 'i' movb $43, 27(%esp) // local variable 'b' leal 27(%esp), %eax movl %eax, 8(%esp) leal 28(%esp), %eax movl %eax, 4(%esp) leal 26(%esp), %eax movl %eax, (%esp) call _some_func movsbl 27(%esp), %eax movl %eax, 12(%esp) movl 28(%esp), %eax movl %eax, 8(%esp) movsbl 26(%esp), %eax movl %eax, 4(%esp) movl $LC0, (%esp) call _printf addl $44, %esp .cfi_def_cfa_offset 4 ret .cfi_endproc LFE7: .def _some_func; .scl 2; .type 32; .endef .def _printf; .scl 2; .type 32; .endef
As you can see, there are 3 local variables ( a , i and b ) with sizes of 1 byte, 4 bytes and 1 byte. Including filling, it will be 12 bytes (provided that the compiler is aligned with 4 bytes).
Wouldn't it be more efficient to use memory if the compiler changed the order of the variables to ( a , b , i )? Then only 8 bytes are needed.
Here is a graphical representation:
3 bytes unused 3 bytes unused vvvvvvvvvvv vvvvvvvvvvv +---+---+---+---+---+---+---+---+---+---+---+---+ | a | | | | i | b | | | | +---+---+---+---+---+---+---+---+---+---+---+---+ | v +---+---+---+---+---+---+---+---+ | a | b | | | i | +---+---+---+---+---+---+---+---+ ^^^^^^^ 2 bytes unused
Is the compiler allowed to perform this optimization (according to the C standard, etc.)?
- If not (it seems to me that the assembly is being displayed), why?
- If so, why doesn't this happen above?