X86_64: stack frame pointer is almost useless?

Question

X86_64: stack frame pointer is almost useless?

Linux x86_64.
gcc 5.x

I studied the output of two codes, with -fomit-frame-pointer and without (gcc on "-O3" turned this option on by default).

pushq %rbp movq %rsp, %rbp ... popq %rbp

My question is:

If you disable this option worldwide, even for, in the worst case, compiling the operating system, is there a catch?

I know interrupts use this information, so is this parameter good for user space only?

+12

c assembly gcc x86-64 stackframe

Kroma Jul 14 '15 at 21:32

source share

1 answer

user781847 · Accepted Answer · 2015-07-15T09:04:57+0000

Compilers always generate self-consistent code, so disabling the frame pointer is good if you are not using external / manually generated code that makes some assumptions about it (for example, relying on the rbp value for example).

Interrupts do not use frame pointer information, they can use the current stack pointer to maintain minimal context, but it depends on the type of interrupt and the OS (the hardware interrupt uses the Ring 0 stack, probably).
You can look in the Intel manuals for more information about this.

About the usefulness of a frame pointer:
A few years ago, after compiling a couple of simple procedures and looking at the generated 64-bit assembly code, I had the same question.
If you do not mind reading a lot of notes that I wrote for myself then, here they are.

Note : the question of the usefulness of something is a little relative. Writing assembly code for the current main 64-bit ABI, I found that I use the stack frame less and less. However, this is just my coding style and opinion.

I like to use the frame pointer, write a prolog and function epilogue, but I also like direct inconvenient answers, so here is how I see it:

Yes, the frame pointer in x86_64 is almost useless

Caution, this is not entirely useless, especially for people, but the compiler no longer needs it. To better understand why we have a frame pointer in the first place, it’s better to recall some story.

Return in real time (16 bit) days

When Intel processors supported only “16-bit mode”, there were some restrictions on access to the stack, especially this instruction was (and remains) invalid

 mov ax, WORD [sp+10h]

because sp cannot be used as a base register. For this purpose, only a few designated registers can be used, for example, bx or the more well-known bp .
This is currently not the detail everyone is looking at, but bp has the advantage over another base register in that it implicitly implies using ss as the segment / selector register, just like implicitly using sp ( push , pop and etc.), And like esp on later 32-bit processors.
Even if your program was scattered throughout the memory, with each segment register pointing to a separate area, bp and sp acted the same way, in the end, this was the intention of the designers.

Thus, a stack frame was usually needed and therefore a frame pointer.
bp effectively divided the stack into four parts: the argument area, return address, the old bp (just WORD), and the local variable area. Each area identified by the offset is used to access it: positive for the arguments and return address, zero for the old bp , negative for local variables.

Extended effective addresses

As Intel processors evolved, broader 32-bit addressing modes were added.
In particular, the ability to use any 32-bit general-purpose register as a base register, this includes the use of esp .
Being such instructions

 mov eax, DWORD [esp+10h]

it is now permissible that the use of the stack frame and frame pointer seems doomed to the end.
This was probably not the case, at least in the beginning.
It's true that you can now use fully esp but stack partitioning in the four areas mentioned is still useful, especially for people.

Without a frame pointer, push or pop would change the argument or offset of the local variable relative to esp , giving the form a code that at first glance does not look intuitive. Consider how to implement the following C routine with cdecl calling convention:

 void my_routine(int a, int b) { return my_add(a, b); }

without and with frame

 my_routine: push DWORD [esp+08h] push DWORD [esp+08h] call my_add ret my_routine: push ebp mov ebp, esp push DWORD [ebp+0Ch] push DWORD [ebp+08h] call my_add pop ebp ret

At first glance it seems that the first version twice puts forward the same meaning. However, in reality, he puts forward two separate arguments, since the first press reduces esp, so when the effective address is calculated the same, the second press leads to another argument.

If you add local variables (especially a lot of them), then the situation quickly becomes difficult to read: mov eax, [esp+0CAh] refer to a local variable or argument? Using the stack frame, we fixed offsets for the arguments and local variables.

Even compilers at first still preferred fixed offsets using the frame base pointer. I see that this behavior changes primarily with gcc.
In the debug assembly, the stack frame effectively adds clarity to the code and allows the (experienced) programmer to keep track of what is happening and, as indicated in the commentary, makes it easier for them to restore the stack frame.
Modern compilers, however, are well versed in mathematics and can easily count the movements of the stack pointer and generate the corresponding offsets from esp , omitting the stack frame for faster execution.

When CISC requires data alignment

Prior to the introduction of SSE instructions, Intel processors never asked programmers much more than their RISC brothers.
In particular, they never asked for data alignment, we could access 32-bit data at a multiple of 4 without any serious complaints (depending on the width of the DRAM data, this can lead to an increase in delay).
SSE used a 16-byte operand, which had to be accessed at the 16-byte boundary, as the SIMD paradigm is efficiently implemented in hardware and becomes more popular, alignment at the 16-byte boundary becomes important.

The main 64-bit ABIs now require this, the stack should be aligned in paragraphs (i.e. 16 bytes).
Now we are usually called such that after the prologue the stack is aligned, but suppose we are not endowed with this guarantee, we need to do one of this

 push rbp push rbp mov rbp, rsp mov rbp, rsp and spl, 0f0h sub rsp, xxx sub rsp, 10h*k and spl, 0f0h

One way or another, the stack is aligned after these prologs, however we can no longer use the negative offset from rbp to access local variables that need alignment because the frame pointer itself is not aligned.
We need to use rsp , we could organize a prolog where rbp points to the top of the aligned region of local variables, but then the arguments will be with unknown offsets.
We can arrange a complex stack frame (possibly with more than one pointer), but the key to the old-fashioned base frame pointer was its simplicity.

Thus, we can use the frame pointer to access the arguments on the stack and the stack pointer for local variables, rightly enough.
Alas, the role of the stack for passing arguments has been reduced, and for a small number of arguments (currently four) it is not even used, and in the future it will probably be used even less.

Thus, we do not use the frame pointer for local variables (mainly) and for arguments (mainly), why do we use it?

Saves a copy of the original rsp , so mov enough to restore the stack pointer when exiting the function. If the stack is aligned with and , which is not reversible, an original copy is needed.
In fact, some ABIs guarantee that after the standard prolog, the stack is aligned, which allows us to use the frame pointer as usual.
Some variables do not need alignment and can be accessed using an unaligned frame pointer, which is usually true for code created manually.
Some functions require more than four parameters.

Summary

The frame pointer is a rudimentary paradigm of 16-bit programs that has proven its effectiveness on 32-bit machines due to its simplicity and clarity when accessing local variables and arguments.
However, on 64-bit machines, strict requirements disappear due to greater simplicity and clarity, however, the frame pointer remains in debug mode.

The fact that the frame pointer can be used for funny things: it's true, I think I have never seen such code, but I can imagine how it will work.
However, I focused on the supporting role of the frame pointer, since I always saw this.
All crazy things can be done with any pointer set to the same value as the frame pointer; I give the latter a more “special” role.
For example, VS2013 sometimes uses rdi as a "frame pointer", but I do not consider it a pointer to a real frame if it does not use rbp/ebp/bp .
For me, using rdi means optimizing rdi pointer frames :)

X86_64: stack frame pointer is almost useless?

My question is:

Return in real time (16 bit) days

Extended effective addresses

When CISC requires data alignment

Summary

More articles: