Strange performance difference in C ++?

Question

Strange performance difference in C ++?

I just stumbled upon a change that seems to have conflicting performance characteristics. Can someone give a possible explanation for this behavior?

Source:

for (int i = 0; i < ct; ++i) { // do some stuff... int iFreq = getFreq(i); double dFreq = iFreq; if (iFreq != 0) { // do some stuff with iFreq... // do some calculations with dFreq... } }

While clearing this code during a "performance pass", I decided to move the dFreq definition inside the if block, since it was used only inside the if block. There are several calculations involving dFreq , so I did not rule it out completely, since it saves the cost of multiple conversion times from int to double . I did not expect the difference in performance, or, if at all possible, to be a slight improvement. However, productivity declined by almost 10%. I have measured this many times, and this is really the only change I made. The code snippet shown above is executed inside a couple of other loops. I get very consistent timings while jogging and can definitely confirm that the change I am describing reduces performance by ~ 10%. I expect performance to increase because converting int to double will only happen when iFreq != 0 .

Chnaged code:

 for (int i = 0; i < ct; ++i) { // do some stuff... int iFreq = getFreq(i); if (iFreq != 0) { // do some stuff with iFreq... double dFreq = iFreq; // do some stuff with dFreq... } }

Can anyone explain this? I am using VC ++ 9.0 with / O 2. I just want to understand what I'm not taking into account here.

+6

c ++ performance optimization

user123456 Feb 05 '10 at 18:59

source share

8 answers

Can the getFreq result be stored inside the register in the first case and written to memory in the second case? It may also be that performance degradation is associated with processor mechanisms such as pipelining and / or branch prediction. You can check the generated assembly code.

+6

MartinStettner Feb 05 '10 at 19:10

source share

It looks like a conveyor rack

 int iFreq = getFreq(i); double dFreq = iFreq; if (iFreq != 0) {

Allows you to convert a double value in parallel with other code since dFreq is not used immediately. it gives the compiler something to do this between saving iFreq and using it, so this conversion is most likely “Free”.

But

 int iFreq = getFreq(i); if (iFreq != 0) { // do some stuff with iFreq... double dFreq = iFreq; // do some stuff with dFreq... }

There may be a click on the storage / control stop after conversion to double, since you immediately start using a double value.

Modern processors can perform several operations per cycle, but only when things are independent. Two consecutive instructions that reference the same register often lead to a halt. The actual conversion to double can take 3 clock cycles, but everything except the first hours can be performed in parallel with other work, if you do not refer to the conversion result for an instruction or two.

C ++ compilers do pretty well with reordering instructions to take advantage of this, it looks like your change has distorted some good optimization.

Another (less likely) possibility is that when the conversion to float was before the branch, the compiler was able to completely remove the branch. Unprofitable code is often a major gain in performance in modern processors.

It would be interesting to see what instructions the compiler actually emitted for these two cases.

+4

John knoeller Feb 05 '10 at 21:25

source share

Try moving the dFreq definition outside the for loop, but keep the assignment inside the for / if block.

Perhaps creating dFreq on the stack for the loop, inside the if, is causing the problem (although the compiler should take care of this). It may be regression in the compiler if dFreq var is in four loops that it created once, inside an if inside of which it is created every time.

 double dFreq; int iFreq; for (int i = 0; i < ct; ++i) { // do some stuff... iFreq = getFreq(i); if (iFreq != 0) { // do some stuff with iFreq... dFreq = iFreq; // do some stuff with dFreq... } }

+3

Gregor brandt Feb 05 '10 at 19:23

source share

perhaps the compiler optimizes it by taking a definition outside the for loop. when you put it in if if compiler optimization doesn't do it.

+2

John boker Feb 05 '10 at 19:10

source share

There, the likelihood that this has changed has caused your compiler to disable some optimizations. What happens if you move your ads over a loop?

+1

mxg Feb 05 '10 at 19:12

source share

As soon as I read the optimization document, which states that as defining variables just before using them and even earlier it was good practice, compilers could optimize the code after this tip.

This article (a bit dated, but fairly reliable) says (with statistics) something similar: http://www.tantalon.com/pete/cppopt/asyougo.htm#PostponeVariableDeclaration

+1

Klaim Feb 05 '10 at 19:15

source share

It is easy enough to find out. Just grab 20 stackshots of the slow version and the fast version. In the slow version you will see about 2 snapshots of what it does, what it does not in the fast version. You will see a subtle difference in where it stops in assembly language.

+1

Mike dunlavey Feb 05 '10 at 19:46

source share

phkahler · Accepted Answer · 2010-02-05T19:32:17+0000

You must put the conversion in dFreq immediately inside if () before doing the calculations using iFreq. Conversion can be performed in parallel with whole calculations if the instruction is further in the code. A good compiler could push it further, and not a very good person could just leave it where it falls. Since you moved it after integer calculations, it cannot work in parallel with the whole code, which leads to a slowdown. If it works in parallel, then generally there can be no improvement at all depending on the processor (issuing an FP instruction, the result of which is never used, has little effect on the original version).

If you really want to improve performance, many people have done the tests and rank the following compilers in the following order:

1) ICC - Intel compiler 2) GCC - good second place 3) The generated MSVC code can be pretty bad compared to others.

You can also try -O3, if any.

Strange performance difference in C ++?

More articles: