There are two parts:
First, does the compiler optimize this?
Run the experiment:
test.cc
#include <random> #include "test2.h" int main() { std::default_random_engine e; std::uniform_int_distribution<int> d(0,1); int flag = d(e); int x = 0; int a = 1; if (flag) { x += a; doA(x); return x; } else { x += a; doB(x); return x; } }
test2.h
void doA(int& x); void doB(int& x);
test2.cc
void doA(int& x) {} void doB(int& x) {}
test2.cc and test2.h both exist solely to prevent the compiler from optimizing everything. The compiler cannot be sure that there is no side effect, since these functions exist in another translation unit.
Now we are building the assembly:
gcc -std=c++11 -S test.cc
And let's move on to the interesting part of the assembly:
call _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_ movl %eax, -40(%rbp); <- setting flag movl $0, -44(%rbp); <- setting x movl $1, -36(%rbp); <- setting a cmpl $0, -40(%rbp); <- first part of if (flag) je .L2; <- second part of if (flag) movl -44(%rbp), %edx <- setting up x movl -36(%rbp), %eax <- setting up a addl %edx, %eax <- adding x and a movl %eax, -44(%rbp) <- assigning back to x leaq -44(%rbp), %rax <- grabbing address of x movq %rax, %rdi <- bookkeeping for function call call _Z3doARi <- function call doA movl -44(%rbp), %eax jmp .L4 .L2: movl -44(%rbp), %edx <- setting up x movl -36(%rbp), %eax <- setting up a addl %edx, %eax <- perform the addition movl %eax, -44(%rbp) <- move it back to x leaq -44(%rbp), %rax <- and so on movq %rax, %rdi call _Z3doBRi movl -44(%rbp), %eax .L4:
So, we see that the compiler has not optimized it. But we also did not ask about it.
g++ -std=c++11 -S -O3 test.cc
and then an interesting assembly:
main: .LFB4729: .cfi_startproc subq $56, %rsp .cfi_def_cfa_offset 64 leaq 32(%rsp), %rdx leaq 16(%rsp), %rsi movq $1, 16(%rsp) movq %fs:40, %rax movq %rax, 40(%rsp) xorl %eax, %eax movq %rdx, %rdi movl $0, 32(%rsp) movl $1, 36(%rsp) call _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE testl %eax, %eax movl $1, 12(%rsp) leaq 12(%rsp), %rdi jne .L83 call _Z3doBRi movl 12(%rsp), %eax .L80: movq 40(%rsp), %rcx xorq %fs:40, %rcx jne .L84 addq $56, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L83: .cfi_restore_state call _Z3doARi movl 12(%rsp), %eax jmp .L80
This is slightly different from my ability to clearly show the 1 to 1 relationship between the assembly and the code, but you can tell from the doA and doB calls that the configuration is common and performed outside the if statement. (Above line jne.L83). So yes, compilers really do this optimization.
Part 2:
How do I know if processors can perform this optimization if the first code is given?
Actually, I don’t know how to check this. So I don’t know. I would rate it as plausible, given that it is not in order and speculative execution exists. But the proof is in the pudding, and I have no way to test this pudding. Therefore, I am reluctant to sue one way or another.