Why does ARM gcc push the register r3 and lr onto the stack at the beginning of the function?

I tried to write a simple test code (main.c):

main.c void test(){ } void main(){ test(); } 

Then I used the arm-non-eabi-gcc command to compile and objdump to get the build code:

 arm-none-eabi-gcc -g -fno-defer-pop -fomit-frame-pointer -c main.c arm-none-eabi-objdump -S main.o > output 

The assembly code will call the registers r3 and lr, even the function did nothing.

 main.o: file format elf32-littlearm Disassembly of section .text: 00000000 <test>: void test(){ } 0: e12fff1e bx lr 00000004 <main>: void main(){ 4: e92d4008 push {r3, lr} test(); 8: ebfffffe bl 0 <test> } c: e8bd4008 pop {r3, lr} 10: e12fff1e bx lr 

My question is, why does arm gcc push push r3 onto the stack, even the test () function never uses it? Does gcc just randomly select 1 register to push? If for stack alignment (8 bytes for ARM), why not just subtract sp? Thanks.

=================== Update ===========================

@KemyLand. For your answer, I have another example: Source code:

 void test1(){ } void test(int i){ test1(); } void main(){ test(1); } 

I use the same compilation command above and then I get the following assembly:

 main.o: file format elf32-littlearm Disassembly of section .text: 00000000 <test1>: void test1(){ } 0: e12fff1e bx lr 00000004 <test>: void test(int i){ 4: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8: e24dd00c sub sp, sp, #12 c: e58d0004 str r0, [sp, #4] test1(); 10: ebfffffe bl 0 <test1> } 14: e28dd00c add sp, sp, #12 18: e49de004 pop {lr} ; (ldr lr, [sp], #4) 1c: e12fff1e bx lr 00000020 <main>: void main(){ 20: e92d4008 push {r3, lr} test(1); 24: e3a00001 mov r0, #1 28: ebfffffe bl 4 <test> } 2c: e8bd4008 pop {r3, lr} 30: e12fff1e bx lr 

If you press {r3, lr} in the first example to use fewer instructions, why didn’t the compiler use just one command in this test () function?

 push {r0, lr} 

It uses 3 teams instead of 1.

 push {lr} sub sp, sp #12 str r0, [sp, #4] 

By the way, why is it a sub sp with 12, the stack is aligned by 8 bytes, can it just substitute it with 4 on the right?

+5
source share
1 answer

According to the standard ARM Embedded ABI , r0 through r3 used to pass arguments to the function, and return, and lr (aka: r14 ) is a reference register whose purpose is to store the return address for the function.

Obviously, lr must be saved, since otherwise main() will not be able to return to its caller.

Now it is notorious that every ARM instruction accepts 32 bits, and, as you mentioned, ARM has a call stack alignment requirement of 8 bytes. And, as a bonus, we use Embedded ARM ABI, so the code size should be optimized. Thus, it is more efficient to have one 32-bit instruction, saving lr and aligning the stack by pressing an unused register ( r3 not required because test() takes no arguments and returns nothing), and then pop in one 32-bit instruction, instead of adding extra instructions (and therefore wasting precious memory!) to manipulate the stack pointer.

In the end, it's pretty logical to conclude that this is just an optimization from GCC.

+6
source

All Articles