unsigned int one ( unsigned int, unsigned int ); unsigned int two ( unsigned int, unsigned int ); unsigned int myfun ( unsigned int x, unsigned int y, unsigned int z ) { unsigned int a,b; a=one(x,y); b=two(a,z); return(a+b); }
compile and disassemble
arm-none-eabi-gcc -c fun.c -o fun.o arm-none-eabi-objdump -D fun.o
compiler generated code
00000000 <myfun>: 0: e92d4800 push {fp, lr} 4: e28db004 add fp, sp, #4 8: e24dd018 sub sp, sp, #24 c: e50b0010 str r0, [fp, #-16] 10: e50b1014 str r1, [fp, #-20] 14: e50b2018 str r2, [fp, #-24] 18: e51b0010 ldr r0, [fp, #-16] 1c: e51b1014 ldr r1, [fp, #-20] 20: ebfffffe bl 0 <one> 24: e50b0008 str r0, [fp, #-8] 28: e51b0008 ldr r0, [fp, #-8] 2c: e51b1018 ldr r1, [fp, #-24] 30: ebfffffe bl 0 <two> 34: e50b000c str r0, [fp, #-12] 38: e51b2008 ldr r2, [fp, #-8] 3c: e51b300c ldr r3, [fp, #-12] 40: e0823003 add r3, r2, r3 44: e1a00003 mov r0, r3 48: e24bd004 sub sp, fp, #4 4c: e8bd4800 pop {fp, lr} 50: e12fff1e bx lr
Short answer: memory is "allocated" both at compile time and at run time. At compile time in the sense that the compiler at compile time determines the size of the stack frame and who goes there. Runtime in the sense that the memory itself is on the stack, which is a dynamic thing. The stack frame is taken from the stack memory at runtime, much like malloc () and free ().
This helps to find out the calling convention, x is in r0, y is in r1, z is in r2. then x has its home at fp-16, y at fp-20 and z at fp-24. then calling one () requires x and y, so it pulls them from the stack (x and y). the result of one () goes to a, which is stored on fp-8, so this is the home for a. etc.
the one function is actually not at address 0, it is a disassembly of an object file that is not related to binary code. as soon as the object is connected to other objects and libraries, the missing parts, for example, where the external functions are located, are fixed by the linker, and the calls in one () and two () will get real addresses. (and the program will most likely not start at address 0).
I cheated a little, I knew that without the optimization included in the compiler and a relatively simple function such as this, there really is no reason for a stack frame:
compile only a small optimization
arm-none-eabi-gcc -O1 -c fun.c -o fun.o arm-none-eabi-objdump -D fun.o
and the stack frame is gone, local variables remain in the register.
00000000: 0: e92d4038 push {r3, r4, r5, lr} 4: e1a05002 mov r5, r2 8: ebfffffe bl 0 c: e1a04000 mov r4, r0 10: e1a01005 mov r1, r5 14: ebfffffe bl 0 18: e0800004 add r0, r0, r4 1c: e8bd4038 pop {r3, r4, r5, lr} 20: e12fff1e bx lr
what the compiler decided to do was to provide more registers to work by storing them on the stack. Why it saved r3 is a mystery, but that's another topic ...
introducing the function r0 = x, r1 = y and r2 = z into the calling convention, we can leave only r0 and r1 (try again with one (y, x) and see what happens), since they fall directly into one () and never used again. The calling convention says that r0-r3 can be destroyed by a function, so we need to save z for a later version to save it in r5. The result of one () is r0 for the calling convention, since two () can destroy r0-r3, we need to save a for later, after calling the function two () we will also need r0 to call two, so r4 now has a . We saved z in r5 (was in r2 moved to r5) before calling one, we need the result of one () as the first parameter for two (), and it already exists, we need z as the second, so we move r5, where we saved z to r1, then we call two (). the result of two () for the calling convention. Since b + a = a + b from the basic mathematical properties, the final addition before returning is r0 + r4, which is equal to b + a, and the result goes to r0, which is the register used to return something from the function according to the convention . clear the stack and restore the changed registers.
Since myfun () made calls to other functions using bl, bl changes the reference register (r14) to be able to return from myfun (), we need the value in the reference register to be preserved from entering the function to the final return (bx lr ), so lr is pushed onto the stack. The convention says that we can destroy r0-r3 in our function, but not other registers, so r4 and r5 are pushed onto the stack because we used them. why r3 is pushed onto the stack is not required from the point of view of a calling convention, I wonder if this was done in anticipation of a 64-bit memory system, making two full 64-bit records cheaper than one 64-bit recording and one 32-bit right . but you need to know how stack alignment happens, so this is just a theory. There is no reason to save r3 in this code.
Now take this knowledge and disassemble the assigned code (arm -...- objdump -D something.something) and do the same analysis. in particular, with functions called main () and functions not named main (I did not specifically use main ()), the stack frame can be a size that makes no sense or less sense than other functions. In the non-optimized case above, we needed to store 6 things total, x, y, z, a, b and the link register 6 * 4 = 24 bytes, which led to sub sp, sp, # 24, I need to think about the stack pointer against frame pointer thing for a bit. I think there is a command line argument to tell the compiler not to use a frame pointer. -fomit-frame-pointer and save multiple instructions
00000000 <myfun>: 0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4: e24dd01c sub sp, sp, #28 8: e58d000c str r0, [sp, #12] c: e58d1008 str r1, [sp, #8] 10: e58d2004 str r2, [sp, #4] 14: e59d000c ldr r0, [sp, #12] 18: e59d1008 ldr r1, [sp, #8] 1c: ebfffffe bl 0 <one> 20: e58d0014 str r0, [sp, #20] 24: e59d0014 ldr r0, [sp, #20] 28: e59d1004 ldr r1, [sp, #4] 2c: ebfffffe bl 0 <two> 30: e58d0010 str r0, [sp, #16] 34: e59d2014 ldr r2, [sp, #20] 38: e59d3010 ldr r3, [sp, #16] 3c: e0823003 add r3, r2, r3 40: e1a00003 mov r0, r3 44: e28dd01c add sp, sp, #28 48: e49de004 pop {lr} ; (ldr lr, [sp], #4) 4c: e12fff1e bx lr
Optimization
saves a lot more though ...