The trivial function that I am compiling with gcc and clang:
void test() { printf("hm"); printf("hum"); }
$ gcc test.c -fomit-frame-pointer -masm=intel -O3 -S sub rsp, 8 .cfi_def_cfa_offset 16 mov esi, OFFSET FLAT:.LC0 mov edi, 1 xor eax, eax call __printf_chk mov esi, OFFSET FLAT:.LC1 mov edi, 1 xor eax, eax add rsp, 8 .cfi_def_cfa_offset 8 jmp __printf_chk
and
$ clang test.c -mllvm --x86-asm-syntax=intel -fomit-frame-pointer -O3 -S # BB#0: push rax .Ltmp1: .cfi_def_cfa_offset 16 mov edi, .L.str xor eax, eax call printf mov edi, .L.str1 xor eax, eax pop rdx jmp printf # TAILCALL
The difference I'm interested in is that gcc uses sub rsp, 8 / add rsp, 8 for the proog function, and clang uses push rax / pop rdx .
Why do compilers use different function prologs? Which option is better? push and pop , of course, encoded for shorter instructions, but faster or slower than add and sub ?
The reason for the stack is primarily because abi requires rsp to align 16 bytes for procedures without a sheet. I could not find compiler flags that remove them.
Judging by your answers, it seems that push and pop are better. push rax + pop rdx = 1 + 1 = 2 vs sub rsp, 8 + add rsp, 8 = 4 + 4 = 8 . Thus, the first pair saves 6 bytes at no cost.