About predicate reduction

I saw a sentence in the article, "Converting branches depending on data to avoid incorrectly predicted branches." (Page 6)

I wonder how to change the code from the branches depending on the data.

This is a document: http://www.adms-conf.org/p1-SCHLEGEL.pdf

Update: How to convert branches to binary search?

+4
source share
2 answers

The main idea (I suppose) would be to change something like:

if (a>b) return "A is greater than B"; else return "A is less than or equal to B"; 

in

 static char const *strings[] = { "A is less than or equal to B", "A is greater than B" }; return strings[a>b]; 

For branches in binary search, consider the basic idea of โ€‹โ€‹a โ€œnormalโ€ binary search, which usually looks (at least indefinitely) as follows:

 while (left < right) { middle = (left + right)/2; if (search_for < array[middle]) right = middle; else left = middle; } 

We could get rid of the branch here in much the same way:

 size_t positions[] = {left, right}; while (left < right) { size_t middle = (left + right)/2; positions[search_for >= array[middle]] = middle; } 

[For general purpose code, use left + (right-left)/2 instead of (left+right)/2 ]

+11
source

Removing branches is not always optimal, even (especially) with binary conditions such as this. I used to study branch removal in this way. After running into an instance where code with a conditional branch in a loop worked faster than equivalent, without code branches, I did some research on processor execution strategies.

I knew that the ARM instruction set has conditional instructions that can make a conditional branch faster than the type of harmless code in the document, but I worked on intelligence (and the branch wasnโ€™t like that, take care). It turns out that modern processors, including intel, sometimes turn a normal instruction into a conditional one: they use impatient execution if the two endpoints of the branch are short enough. That is, the CPU will put both possible paths into the pipeline and execute both of them, and only save the correct result as soon as it is known. This avoids erroneous prediction without having to index the array. See https://en.wikipedia.org/wiki/Speculative_execution#Variants for more details.

It takes a lot of detailed knowledge about the processor to write the optimal code for it, even in the assembly. This allows naive implementations to sometimes be faster than manually optimized ones that do not take into account some processor characteristics. The same thing can happen with optimizing compilers: sometimes creating more complex โ€œoptimizedโ€ code violates some of the compiler optimizations and leads to slower execution than a naive implementation that the compiler can fully optimize.

When doubt and performance are critical, and there is time, it is usually best to try both ways and see which is faster!

0
source

All Articles