The first version performs optimization by moving the value from memory to a local variable. The second version does not work.
I was expecting the compiler to probably prefer the localValue optimization here and not read and write the value from memory for each iteration of the loop. Why is this not so?
class Example { public: void processSamples(float * x, int num) { float localValue = v1; for (int i = 0; i < num; ++i) { x[i] = x[i] + localValue; localValue = 0.5 * x[i]; } v1 = localValue; } void processSamples2(float * x, int num) { for (int i = 0; i < num; ++i) { x[i] = x[i] + v1; v1 = 0.5 * x[i]; } } float v1; };
processSamples compiles the code as follows:
.L4: addss xmm0, DWORD PTR [rax] movss DWORD PTR [rax], xmm0 mulss xmm0, xmm1 add rax, 4 cmp rax, rcx jne .L4
processSamples2:
.L5: movss xmm0, DWORD PTR [rax] addss xmm0, DWORD PTR example[rip] movss DWORD PTR [rax], xmm0 mulss xmm0, xmm1 movss DWORD PTR example[rip], xmm0 add rax, 4 cmp rax, rdx jne .L5
Since the compiler does not have to worry about threads (v1 is not atomic). Could this just suggest that nothing else will look at this value and go forward and keep it in the register while the loop rotates?
See https://godbolt.org/g/RiF3B4 for a complete build and selection of compilers to choose from!
c ++ optimization g ++ clang clang ++
Jcx
source share