Delete conditions in program C to speed up

I have a number of crunchy C programs that includes a main loop with two conditional expressions:

for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { for (k = 0; k < N; k++) { if (k == i || k == j) continue; ...(calculate a, b, c, d (depending on k) if (a*a + b*b + c*c < d*d) {break;} } //k } //j } //i 

The hardware is a Cell Processor SPE, where there is a big limitation when using branching. Therefore, to optimize my program for acceleration, I need to remove these 2 conditional expressions, do you know about good strategies for this?

+4
source share
4 answers

For the first, you can break it into several cycles, for example change:

 for(int i = 0; i < 1000; i++) for(int j = 0; j < 1000; j++) { for(int k = 0; k < 1000; k++) { if(k==i || k == j) continue; // other code } } 

in

 for(int i = 0; i < 1000; i++) for(int j = 0; j < 1000; j++) { for(int k = 0; k < min(i, j); k++) { // other code } for(int k = min(i, j) + 1; k < max(i, j); k++) { // other code } for(int k = max(i, j) + 1; k < 1000; k++) { // other code } } 

To remove the second, you can save the previous total and use it in the conditions of the for loop, that is:

 int left_side = 1, right_side = 0; for(int i = 0; i < N; i++) for(int j = 0; j < N; j++) { for(int k = 0; k < min(i, j) && left_side >= right_side; k++) { // other code (calculate a, b, c, d) left_side = a * a + b * b + c * c; right_side = d * d; } for(int k = min(i, j) + 1; k < max(i, j) && left_side >= right_side; k++) { // same as in previous loop } for(int k = max(i, j) + 1; k < N && left_side >= right_side; k++) { // same as in previous loop } } 

Implementing min and max without branching can also be a daunting task. Perhaps this version is better:

 int i, j, k, left_side = 1, right_side = 0; for(i = 0; i < N; i++) { // this loop covers the case where j < i for(j = 0; j < i; j++) { k = 0; for(; k < j && left_side >= right_side; k++) { // other code (calculate a, b, c, d) left_side = a * a + b * b + c * c; right_side = d * d; } k++; // skip k == j for(; k < i && left_side >= right_side; k++) { // same as in previous loop } k++; // skip k == i for(; k < N && left_side >= right_side; k++) { // same as in previous loop } } j++; // skip j == i // and now, j > i for(; j < N; j++) { k = 0; for(; k < i && left_side >= right_side; k++) { // other code (calculate a, b, c, d) left_side = a * a + b * b + c * c; right_side = d * d; } k++; // skip k == i for(; k < j && left_side >= right_side; k++) { // same as in previous loop } k++; // skip k == j for(; k < N && left_side >= right_side; k++) { // same as in previous loop } } } 
+2
source

I agree with 'sje397'.

In addition, you provide too little information about your problem. You say branching is expensive. But how often does this actually happen? Maybe your problem is that the code generated by the compiler branches out in a common script?

Perhaps you could reinstall your if -s. The implementation of if actually depends on the compiler, and many compilers interpret it in a straightforward manner. That is: if - common - else - rare (jump).

Then try the following:

 for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { for (k = 0; k < N; k++) { if (k != i && k != j) { ...(calculate a, b, c, d) if (a*a + b*b + c*c >= d*d) { ... } else break; } } //k } //j } //i 

EDIT:

Of course, you can go to the assembler level to provide the correct generated code.

+1
source

I would look first at your calculate code, because it could drown out all of these branching problems. Some sample research will probably find out.

However, it looks like you are doing for each i,j linear search for the first point inside the sphere. Could you have 3 arrays, one for each of the X, Y, and Z axes, and in each array the indices of all the source points are stored in ascending order along this axis? This may make it easier to find the closest neighbor. In addition, you can use the in-cube test, not the test in the ball, because you are not looking for the nearest point, but only for the neighboring point.

0
source

Are you sure you really need the first if statement? Even if he skips one calculation when k is equal to me or j, the penalty for checking it for each iteration is very expensive. Also, keep in mind that if N is not a constant, the compiler will probably not be able to expand the for loops.

Although, if it is a cell processor, the compiler may even try to vectorize the outlines.

If for loops compile into regular iterative loops, you might have the idea of ​​getting them to compare with zero instead, since the decrement operation will often compare you when it reaches zero.

 for (i = 0; i < N; i++) { 

... can be...

 for (i = N; i != 0; i--) { 

Although, if "i" is used as an index or a variable in the calculation, you can get a performance degradation, as you will get cache misses.

0
source

All Articles