Understanding the empty main () assembly translation

Can someone explain what GCC is doing for this piece of code? What is this initialization? Source:

#include <stdio.h> int main() { } 

And it was translated into:

  .file "test1.c" .def ___main; .scl 2; .type 32; .endef .text .globl _main .def _main; .scl 2; .type 32; .endef _main: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax call __alloca call ___main leave ret 

I would appreciate it if the compiler / assembly guru made me start by explaining the stack, registering and initializing the section. I can not make a head or tail out of code.

EDIT: I am using gcc 3.4.5. and the command line argument is gcc -S test1.c

Thanks, kunjaan.

+14
c assembly gcc compiler-construction
May 05 '09 at 3:07
source share
5 answers

I have to preface all my comments by saying that I was still studying hard.

I will ignore section initialization. An explanation of the section initialization and basically everything that I will explain can be found here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

Ebp register is the stack stack of the base pointer, hence BP. It stores a pointer to the beginning of the current stack.

The esp register is a stack pointer. It contains a memory cell at the top of the stack. Each time we push something on the stack, esp is updated, so that it always points to the address of the top of the stack.

So ebp points to the base and esp points to the top. So the stack looks like this:

 esp -----> 000a3 fa 000a4 21 000a5 66 000a6 23 esb -----> 000a7 54 

If you hit e4 on the stack, this will happen:

 esp -----> 000a2 e4 000a3 fa 000a4 21 000a5 66 000a6 23 esb -----> 000a7 54 

Please note that the stack is growing towards lower addresses, this fact will be important below.

The first two steps are known as the prologue of the procedure or, most often, the prologue of the function , they prepare the stack for use by local variables. See the summary of the procedure below.

In step 1, we store a pointer to the old stack stack on the stack, calling pushl% ebp. Since main is the first function called, I have no idea what the previous% ebp value is.

Step 2: We introduce a new frame stack, because we introduce a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp as the beginning of our stack frame.

Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows to lower addresses, thus, subtracting by 8, moves the top of the stack by 8 bytes.

Step 4; Select the stack, I found different opinions on this. I'm not quite sure that this is done. I suspect this is being done in order to allow large instructions (SIMDs) to stand out on the stack,

http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html

This code "and" s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. Examination of the source code of Mingw shows that this may be for SIMD instructions appearing in the "_main" routines that work only on aligned addresses. Since our procedure does not contain SIMD instructions, this line is not required.

http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

Steps 5 through 11 seem to have no purpose for me. I could not find any explanation on Google. Maybe someone who really knows this material provides a deeper understanding. I heard rumors that this material is used to handle C exceptions.

Step 5 stores the return value of main 0 in eax.

Step 6 and 7 for some unknown reason, add 15 to hex to eax. eax = 01111 + 01111 = 11110

Step 8 we shift the bits eax 4 bits to the right. eax = 00001, because the last bits are shifted from the end of 00001 | 111.

Step 9 shift the bits eax 4 bits to the left, eax = 10000.

Steps 10 and 11 move the value in the first 4 allocated bytes on the stack to eax, and then move it back from eax.

Steps 12 and 13 install the c library.

We have reached the epilogue function . That is, part of the function returns the stack pointers, esp and ebp to the state they were in before this function was called.

Step 14, leave the esp value equal to the ebp value, moving the top of the stack to the address that was before main was called. He then sets up ebp to indicate the address that we saved on the top of the stack in step 1.

Holidays can only be replaced with the following instructions:

 mov %ebp, %esp pop %ebp 

Step 15, returns and exits the function.

 1. pushl %ebp 2. movl %esp, %ebp 3. subl $8, %esp 4. andl $-16, %esp 5. movl $0, %eax 6. addl $15, %eax 7. addl $15, %eax 8. shrl $4, %eax 9. sall $4, %eax 10. movl %eax, -4(%ebp) 11. movl -4(%ebp), %eax 12. call __alloca 13. call ___main 14. leave 15. ret 

Prolog Procedure:

The first thing that a function needs to do is called the prolog procedure. This first saves the current base pointer (ebp) with the pushl% ebp instruction (remember that ebp is a register used to access functional parameters and local variables). Now it copies the stack pointer (esp) to the base pointer (ebp) with the instruction movl% esp,% ebp. This allows access to function parameters as indices from the base pointer. Local variables are always subtracted from ebp, for example, -4 (% ebp) or (% ebp) -4 for the first local variable, the return value is always 4 (% ebp) or (% ebp) +4, each parameter or argument is N * 4 + 4 (% ebp), for example 8 (% ebp) for the first argument while the old ebp is in (% ebp).

http://www.milw0rm.com/papers/52

There is a really great thread that answers many of these questions. Why is there more instructions in my gcc release?

A good link to x86 machine code instructions can be found here: http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html

This is a lecture that contains some of the ideas below: http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm

Here is another answer to your question: http://www.phiral.net/linuxasmone.htm

None of these sources explain everything.

+14
May 05 '09 at 5:16
source share

Here's a nice step-by-step breakthrough of the simple main() function compiled by GCC, with lots of details: GAS Syntax (Wikipedia)

For the code you inserted, the instructions are broken as follows:

  • The first four instructions (pushl through andl): setting up a new stack frame
  • The following five commands (movl through sall): generate a strange value for eax that will become the return value (I have no idea how he decided to do this)
  • The following two commands (both movl): save the calculated return value to a temporary variable on the stack
  • The following two commands (both calls): call the C library init functions
  • leave instruction: breaks the stack frame
  • ret instruction: returns to the caller (an external execution function or possibly a kernel function calling your program)
+9
May 05 '09 at 3:42 a.m.
source share

Well, I don’t know much about GAS, and I'm a little rusty on the Intel build, but it looks like its initializing main frame stack.

if you look, __main is some kind of macro, initialization should be done. Then, when the main object is empty, it calls the leave command to return to the function called main.

From http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax#.22hello.s.22_line-by-line :

This line declares the label "_main", indicating the location that is invoked from the startup code.

  pushl %ebp movl %esp, %ebp subl $8, %esp 

These lines store the EBP value on the stack, then move the ESP value to EBP, and then subtract 8 from the ESP. An β€œL” at the end of each operation code indicates that we want to use a version of the operation code that works with β€œlong” (32-bit) operands;

  andl $-16, %esp 

This code "and" s ESP with 0xFFFF0000, aligning the stack with the next lower 16-byte boundary. (required when using simd instructions, not useful here)

  movl $0, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax 

This code moves zero to EAX, and then moves EAX to the EBP-4 memory location, which is located in the temporary space that we reserved on the stack at the beginning of the procedure. He then moves the EBP-4 memory back to EAX; obviously this is not optimized code.

  call __alloca call ___main 

These functions are part of the C library configuration. Since we call functions in the C library, we probably need them. The exact operations they perform depend on the platform and version of the installed GNU tools.

Here is a useful link.

http://unixwiz.net/techtips/win32-callconv-asm.html

+4
May 05 '09 at 3:18
source share

It really helps to find out which version of gcc you are using and which libc. It looks like you have a very old version of gcc, or a weird platform, or both. What happens is this weirdness with challenges. I can tell you a few things:

Save the frame pointer on the stack according to agreement:

 pushl %ebp movl %esp, %ebp 

Make room for things at the old end of the frame and around the stack pointer to a multiple of 4 (why I don't need it, I don't know):

 subl $8, %esp andl $-16, %esp 

With a crazy song and dance, get ready to return 1 from main :

 movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -4(%ebp) movl -4(%ebp), %eax 

Recover any memory allocated with alloca (GNU-ism):

 call __alloca 

Declare libc that main exits (more GNU-ism):

 call ___main 

Restore pointers to frame and stack:

 leave 

Return:

 ret 

This is what happens when I compile the same source code with gcc 4.3 on Debian Linux:

  .file "main.c" .text .p2align 4,,15 .globl main .type main, @function main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ecx popl %ecx popl %ebp leal -4(%ecx), %esp ret .size main, .-main .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbits 

And I will break it like this:

Tell the debugger and other tools the source file:

  .file "main.c" 

The code is in the text section:

  .text 

Hit me:

  .p2align 4,,15 

main - exported function:

 .globl main .type main, @function 

main entry point:

 main: 

Take the return address, align the stack with the 4-byte address and save the return address again (why can't I say):

  leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) 

Save frame pointer using standard convention:

  pushl %ebp movl %esp, %ebp 

The incomprehensible madness:

  pushl %ecx popl %ecx 

Restore frame pointer and stack pointer:

  popl %ebp leal -4(%ecx), %esp 

Return:

  ret 

Additional information for the debugger:

  .size main, .-main .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbits 

By the way, main is special and magical; when i compile

 int f(void) { return 17; } 

I get something a little more normal:

  .file "fc" .text .p2align 4,,15 .globl f .type f, @function f: pushl %ebp movl $17, %eax movl %esp, %ebp popl %ebp ret .size f, .-f .ident "GCC: (Debian 4.3.2-1.1) 4.3.2" .section .note.GNU-stack,"",@progbits 

There's still a ton of decoration, and we still keep a pointer to the frame, moving it and restoring it, which is completely pointless, but the rest of the code makes sense.

+4
May 05 '09 at 3:59 a.m.
source share

It seems that GCC acts as if editing main() to include CRT initialization code. I just confirmed that I am getting the same assembly listing from MinGW GCC 3.4.5 here, with your source code.

The command I use is:

 gcc -S emptymain.c

Interestingly, if I change the name of the function to qqq() instead of main() , I get the following assembly:

         .file "emptymain.c"
         .text
 .globl _qqq
         .def _qqq;  .scl 2;  .type 32;  .endef
 _qqq:
         pushl% ebp
         movl% esp,% ebp
         popl% ebp
         ret

which makes sense for an empty function without enabling optimizations.

+1
May 05 '09 at 3:54
source share



All Articles