Optimization of a large if-else branch with binary search

So, in my program there is an if-else branch with about 30 if-else statements. This part works more than 100 times per second, so I saw it as an opportunity to optimize and did a binary search using an array of function pointers (an almost balanced map of trees) instead of doing linear checks of if-else conditions. But it worked slower at about 70% of the previous speed.

I made a simple test program to check the problem, and also gave a similar result that the if-else part works faster, both with compiler optimization and without it.

I also counted the number of comparisons made, and, as expected, one doing a binary search did about half the number of comparisons than a simple if-else branch. But still, it was 20-30% slower.

I want to know where all my computing time is wasted, and why is a linear if-else faster than a logarithmic binary search?

#include <stdio.h> #include <stdlib.h> #include <time.h> long long ifElseCount = 0; long long binaryCount = 0; int ifElseSearch(int i) { ++ifElseCount; if (i == 0) { return 0; } ++ifElseCount; if (i == 1) { return 1; } ++ifElseCount; if (i == 2) { return 2; } ++ifElseCount; if (i == 3) { return 3; } ++ifElseCount; if (i == 4) { return 4; } ++ifElseCount; if (i == 5) { return 5; } ++ifElseCount; if (i == 6) { return 6; } ++ifElseCount; if (i == 7) { return 7; } ++ifElseCount; if (i == 8) { return 8; } ++ifElseCount; if (i == 9) { return 9; } } int getZero(void) { return 0; } int getOne(void) { return 1; } int getTwo(void) { return 2; } int getThree(void) { return 3; } int getFour(void) { return 4; } int getFive(void) { return 5; } int getSix(void) { return 6; } int getSeven(void) { return 7; } int getEight(void) { return 8; } int getNine(void) { return 9; } struct pair { int n; int (*getN)(void); }; struct pair zeroToNine[10] = { {0, getZero}, {2, getTwo}, {4, getFour}, {6, getSix}, {8, getEight}, {9, getNine}, {7, getSeven}, {5, getFive}, {3, getThree}, {1, getOne}, }; int sortCompare(const void *p, const void *p2) { if (((struct pair *)p)->n < ((struct pair *)p2)->n) { return -1; } if (((struct pair *)p)->n > ((struct pair *)p2)->n) { return 1; } return 0; } int searchCompare(const void *pKey, const void *pElem) { ++binaryCount; if (*(int *)pKey < ((struct pair *)pElem)->n) { return -1; } if (*(int *)pKey > ((struct pair *)pElem)->n) { return 1; } return 0; } int binarySearch(int key) { return ((struct pair *)bsearch(&key, zeroToNine, 10, sizeof(struct pair), searchCompare))->getN(); } struct timer { clock_t start; clock_t end; }; void startTimer(struct timer *timer) { timer->start = clock(); } void endTimer(struct timer *timer) { timer->end = clock(); } double getSecondsPassed(struct timer *timer) { return (timer->end - timer->start) / (double)CLOCKS_PER_SEC; } int main(void) { #define nTests 500000000 struct timer timer; int i; srand((unsigned)time(NULL)); printf("%d\n\n", rand()); for (i = 0; i < 10; ++i) { printf("%d ", zeroToNine[i].n); } printf("\n"); qsort(zeroToNine, 10, sizeof(struct pair), sortCompare); for (i = 0; i < 10; ++i) { printf("%d ", zeroToNine[i].n); } printf("\n\n"); startTimer(&timer); for (i = 0; i < nTests; ++i) { ifElseSearch(rand() % 10); } endTimer(&timer); printf("%f\n", getSecondsPassed(&timer)); startTimer(&timer); for (i = 0; i < nTests; ++i) { binarySearch(rand() % 10); } endTimer(&timer); printf("%f\n", getSecondsPassed(&timer)); printf("\n%lli %lli\n", ifElseCount, binaryCount); return EXIT_SUCCESS; } 

possible conclusion:

 78985494 0 2 4 6 8 9 7 5 3 1 0 1 2 3 4 5 6 7 8 9 12.218656 16.496393 2750030239 1449975849 
+8
optimization with linear-search binary-search
source share
2 answers

You should look at the generated instructions to see ( gcc -S source.c ), but usually it comes down to the following three:

1) N is too small.

If you have only 8 different branches, you perform an average of 4 checks (assuming equally probable cases, otherwise it could be even faster).

If you do a binary search, that is, log (8) == 3 checks, but these checks are much more complicated, which leads to the execution of more complete code.

So, if your N is not in the hundreds, it probably does not make sense to do this. You can perform profiling to find the actual value for N.

2) Predicting a branch is more difficult.

In the case of a linear search, each condition is true in 1/N cases, which means that the compiler and branch predictor cannot have any branch, and then restore only once. For a binary search, you will probably finish flushing the pipeline once in each layer. And for N <1024, 1/log(N) probability of erroneous prediction is actually detrimental to performance.

3) Function pointers are slow

When executing a function pointer, you must get it from memory, then you need to load your function into the instruction cache, then execute the call command, configure the function, and return. You cannot embed functions called with a pointer, so these are a few additional instructions, as well as access to memory, as well as moving things to / from the cache. It folds pretty quickly.


In general, this only makes sense for large N, and you should always profile before applying these optimizations.

+6
source share

Use the switch statement.

Compilers are smart. They will generate the most efficient code for your specific values. They will even do a binary search (with embedded code) if it is considered more efficient.

And as a huge benefit, the code is readable and does not require you to make changes to half a dozen places to add a new case.

PS. Obviously your code is a good learning experience. Now you find out, so don't do it again :-)

+2
source share

All Articles