Do processors really calculate multiplication by zero or one? What for?

Short version

In the next line:

aData[i] = aData[i] + ( aOn * sin( i ) ); 

If aOn is 0 or 1 , does the processor really perform the multiplication or does it conditionally return the result ( 0 for 0 , another value for 1 )?

Long version

I look at consistency in algorithm performance, which is partly due to the Prediction of Branches effect.

This code is supposed to be:

 for ( i = 0; i < iNumSamples; i++ ) aData[i] = aData[i] + ( aOn * sin( i ) ); 

will provide more stable performance than this code (where branch prediction can destabilize performance):

 for ( i = 0; i < iNumSamples; i++ ) { if ( aOn ) aData[i] = aData[i] + sin( i ); } 

with aOn is either 0 or 1 , and it can switch during the execution of the loop by another thread.

The actual conditional calculation ( + sin( i ) in the above example) involves a lot of processing and the if condition must be inside the loop (there are many conditions, not just one, as in the above example, and changes in aOn should have an effect immediately, not per cycle).

Ignoring performance consistency, the trade-off between the two parameters is the time it takes to execute the if and the multiply operation.

Despite this, it is easy to notice that if the processor does not perform the actual multiplication for values ​​such as 1 and 0 , the first option may be a win-win solution (no branch prediction, higher performance).

+7
c ++ performance c algorithm processors
source share
1 answer

Processors perform regular multiplication with 0 and 1 s.

The reason is that if the processor checks for 0 and 1 before each calculation, introducing the condition will take more cycles. While you would increase performance for factors 0 and 1 , you would lose performance for any other values ​​(which is much more likely).

A simple program can prove this:

 #include <iostream> #include "cycle.h" #include "time.h" void Loop( float aCoefficient ) { float iSum = 0.0f; clock_t iStart, iEnd; iStart = clock(); for ( int i = 0; i < 100000000; i++ ) { iSum += aCoefficient * rand(); } iEnd = clock(); printf("Coefficient: %f: %li clock ticks\n", aCoefficient, iEnd - iStart ); } int main(int argc, const char * argv[]) { Loop( 0.0f ); Loop( 1.0f ); Loop( 0.25f ); return 0; } 

To output:

 Coefficient: 0.000000: 1380620 clock ticks Coefficient: 1.000000: 1375345 clock ticks Coefficient: 0.250000: 1374483 clock ticks 
+6
source share

All Articles