Why does GCC on x86-64 insert a NOP inside a function?

Given the following C function:

void go(char *data) { char name[64]; strcpy(name, data); } 

GCC 5 and 6 on x86-64 compilation (plain gcc -c -g -o followed by objdump ) so that:

 0000000000000000 <go>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 50 sub $0x50,%rsp 8: 48 89 7d b8 mov %rdi,-0x48(%rbp) c: 48 8b 55 b8 mov -0x48(%rbp),%rdx 10: 48 8d 45 c0 lea -0x40(%rbp),%rax 14: 48 89 d6 mov %rdx,%rsi 17: 48 89 c7 mov %rax,%rdi 1a: e8 00 00 00 00 callq 1f <go+0x1f> 1f: 90 nop 20: c9 leaveq 21: c3 retq 

Is there any reason for GCC to insert 90 / nop into 1f or is it just a side effect that can happen when optimization is not turned on?

Note. This question is different from most others because it asks about nop inside the body of the function, and not about the external complement.

Checked compiler versions: GCC Debian 5.3.1-14 (5.3.1) and Debian 6-20160313-1 (6.0.0)

+7
assembly gcc x86-64 nop
source share
1 answer

What is strange, I have never noticed a wandering nop in the asm output in -O0 before. (Perhaps because I do not spend my time looking for an optimized compiler).

Normally, nop internal functions should align branch chains, including function entry points, such as a question related to Brian . (Also see -falign-loops in gcc docs , which is enabled by default at optimization levels other than -Os ).


In this case, nop is part of the compiler noise for an empty empty function:

 void go(void) { //char name[64]; //strcpy(name, data); } push rbp mov rbp, rsp nop # only present for gcc5, not gcc 4.9.3 pop rbp ret 

See this code in the Godbolt compiler explorer so you can check asm for other versions of the compiler and compilation options.

(Technically, noise, but -O0 includes -fno-omit-frame-pointer , and with -O0 even empty functions set and reset the stack frame.)


Of course, nop absent at any non-zero level of optimization. There is no debugging or performance for this nop in the code in the question. (See links to the performance guide in the x86 tag wiki, esp. Agner Fog microarchitecture guide to find out what code quickly does on current processors.)

I assume this is just an artifact of the gcc internals . This nop present as nop in the gcc -S asm output, and not as a .p2align directive. Gcc itself does not take into account bytes of machine code; it simply uses alignment directives at certain points to align important branch goals. Only the assembler knows how big a nop really necessary to achieve the given alignment.

By default, -O0 tells gcc that you want it to compile quickly and not generate good code. This means that the asm output shows more about gcc components than other -O levels, and very little about how to optimize or something else.

If you are trying to learn asm, it is more interesting to look at the -Og code, for example (optimize for debugging).

If you are trying to understand how well gcc or clang does when creating the code, you should look at -O3 -march=native (or -O2 -mtune=intel or any settings that you build your project with). However, bewilderment of optimizations made in -O3 is a good way to learn some tricks for asm. -fno-tree-vectorize handy if you want to see a non-vectorized version of something fully optimized other than this.

+9
source share

All Articles