I wrote this snippet in a recent argument over the estimated speed of array[i++] vs array[i]; i++ array[i]; i++ .
int array[10]; int main(){ int i=0; while(i < 10){ array[i] = 0; i++; } return 0; }
Snippet in compiler explorer: https://godbolt.org/g/de7TY2
As expected, the compiler outputs identical asm for array[i++] and array[i]; i++ array[i]; i++ not less than -O1 . However, I was surprised that placing xor eax, eax would seem to be random in the function at higher levels of optimization.
NCA
In -O2 , GCC puts xor in front of ret , as expected
mov DWORD PTR [rax], 0 add rax, 4 cmp rax, OFFSET FLAT:array+40 jne .L2 xor eax, eax ret
However, it puts xor after the second mov in -O3
mov QWORD PTR array[rip], 0 mov QWORD PTR array[rip+8], 0 xor eax, eax mov QWORD PTR array[rip+16], 0 mov QWORD PTR array[rip+24], 0 mov QWORD PTR array[rip+32], 0 ret
MOGO
icc usually puts it in -O1 :
push rsi xor esi, esi push 3 pop rdi call __intel_new_feature_proc_init stmxcsr DWORD PTR [rsp] xor eax, eax or DWORD PTR [rsp], 32832 ldmxcsr DWORD PTR [rsp] ..B1.2: mov DWORD PTR [array+rax*4], 0 inc rax cmp rax, 10 jl ..B1.2 xor eax, eax pop rcx ret
but in a strange place at -O2
push rbp mov rbp, rsp and rsp, -128 sub rsp, 128 xor esi, esi mov edi, 3 call __intel_new_feature_proc_init stmxcsr DWORD PTR [rsp] pxor xmm0, xmm0 xor eax, eax or DWORD PTR [rsp], 32832 ldmxcsr DWORD PTR [rsp] movdqu XMMWORD PTR array[rip], xmm0 movdqu XMMWORD PTR 16+array[rip], xmm0 mov DWORD PTR 32+array[rip], eax mov DWORD PTR 36+array[rip], eax mov rsp, rbp pop rbp ret
and -O3
and rsp, -128 sub rsp, 128 mov edi, 3 call __intel_new_proc_init stmxcsr DWORD PTR [rsp] xor eax, eax or DWORD PTR [rsp], 32832 ldmxcsr DWORD PTR [rsp] mov rsp, rbp pop rbp ret
Clang
only clang puts xor immediately before ret at all optimization levels:
xorps xmm0, xmm0 movaps xmmword ptr [rip + array+16], xmm0 movaps xmmword ptr [rip + array], xmm0 mov qword ptr [rip + array+32], 0 xor eax, eax ret
Since GCC and ICC do this at higher levels of optimization, I believe there must be some good reason.
Why do some compilers do this?
Of course, the code is semantically identical, and the compiler can change it as it wishes, but since it only changes at higher levels of optimization, this should be caused by some optimization.