A technique for determining whether it is possible to generate an entire sequence without branches?

Question

A technique for determining whether it is possible to generate an entire sequence without branches?

If you are optimizing an architecture on which a branch is expensive (say, a PS3 cellular memory processor), it may be important to determine if you can express this algorithm without using branches, or at least using fewer branches. One template that I see a lot in non-optimized code is a bunch if it is used to set the index into some array (if the array size is odd, hit the index by 1, under some other circumstances, multiply by 2, etc.). Therefore, it would be nice if there was a way, given two lists of numbers, to determine whether it is possible to write a branching function that converts one list to another.

For example, I recently wanted to know if it is possible to write a function without branching, which converts: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to: 0, 2, 4, 6, 8, 9, 7, 5, 3, 1 (ascending, even descending odd). Technically, I could write a large switch / case function, but obviously I'm interested in a function that will follow a pattern for arbitrary sizes. Writing a function to perform this conversion is simple with branching, but if there is an invariant way to do this, it is not immediately obvious.

So, is there a general approach to this problem or some quick litmus test? Or do you need to come up with evidence on an individual basis? I could work hard on such problems, but it makes no sense if they are literally impossible. It seems to me that at some point it seems to me that there is a formal mathematical word for functions that use arithmetic without branching, but I can’t remember.

+4

optimization arrays math branch

Joseph Garvin Oct 30 '09 at 18:14

source share

7 answers

Adisak · Answer 1 · 2009-10-30T19:12:43+0000

Transformation: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to: 0, 2, 4, 6, 8, 9, 7, 5, 3, 1 descending odd).

Simple: given a sequence of N values of X from 0 to N-1, we see that the first half of the sequence is 2X. The second half of the sequence (2N-1) -2X. The sequence splits at X = (N + 1) / 2 with "integer" math. In the above example, N == 10.

Thus, assuming a 32-bit signed int with an arithmetic right shift:

 int Transform(int x) { const int seq1=x+x; const int mask=(x-((N+1)>>1))>>31; const int seq2=(N+N-1)-seq1; return (mask&seq1)|((~mask)&seq2); }

Note that the mask pattern used here is fast, because PowerPC has ANDC (and with an add-on) that does (~mask) free operation.

Adisak · Answer 2 · 2009-10-30T19:03:14+0000

If you are optimizing for PS3, in particular, the Power PC Compiler Writers Guide has methods for branching code in Section 3.1.5 and it has the GNU Superoptimizer sequences for branching code in Appendix D.

You might be interested in Mike Acton Cell Performance .

Victor Liu · Answer 3 · 2009-10-30T19:04:38+0000

If you create the desired indexes against your input indexes, you will get a triangular shape. It turns out that for your case n = 10 this is

 9.5 - abs(2 (x - 4.75))

Therefore, for general n this will be

 n-0.5 - abs(2*(x - n/2-0.25))

Or in integer form,

 (2*n-1 - abs(4*x - 2*n + 1)) / 2

This is completely flat, since your output indexes are generated using a single math function. I think that a general approach would be to draw the necessary indexes and look for a template and a way to represent it using mathematical functions.

Obviously, if your desired trailing indices form a straight line, then the conversion is simple. If you have a kink in the display, then you want to use the absolute value function to enter the kink, and you can adjust the scaling to change the rotation angle. You can tilt the kink by moving it (for example, abs(x)+x/2 ). If you need a jump gap in your final index function, then use the sign function (hopefully built-in, or use abs (x) / x). You need to be creative in using common function graphs to your advantage.

Adding

If your indexing function is piecewise linear, there is a simple algorithm. Suppose that the desired index function is expressed as a list of segments

 {(sx1,sy1)-(ex1,ey1), (sx2,sy2)-(ex2,ey2), ... , (sxN,syN)-(exN,eyN)} segment 1 segment 2 segment N

where exK> sxK for all K and sxK> sx (K-1) for all K (put them from left to right).

 k = 1 f(x) = Make affine model of segment k g(x) = f(x) Do: k = k + 1 h(x) = Makeaffine model of segment k If g(x) and h(x) intersect between ex(k-1) and ex(k) f(x) = f(x) + [slope difference of g(x) and h(x)] * ramp(x) Else f(x) = f(x) + (h(ex(k-1)) - f(ex(k-1))) * step(x) f(x) = f(x) + [slope difference of g(x) and h(x)] * ramp(x)

where ramp(x) = (abs(x)+x)/2 and step(x) = (sign(x)+1)/2 . f (x) means the given function, g(x) is the last affine model of the segment, and h(x) is the affine model of the current segment. The affine model is just a line in the form of a slope displacement: a*x+b , and the slope difference is the slope difference. This algorithm simply proceeds from the left right, adding the correct pieces of functions to it. The functions he adds are always zero for x <= 0 , so they do not affect the f(x) that has been created so far.

Of course, there may be some errors / typos. I really need to get to the meeting, so I can no longer write.

lhf · Answer 4 · 2009-10-31T10:38:20+0000

You can always write a polynomial formula using Lagrange interpolation, for example. Not nice (or especially fast), but it won't have any branches.

Toad · Answer 5 · 2009-10-30T18:23:10+0000

If speed is really important, could you write out instructions for lists up to a certain length? (One could generate this code, of course).

So:

  void algorithm1_Length6(int *srcList, int *destList) { *destList++ = *srcList; *destList++ = srcList[2]; *destList++ = srcList[4]; *destList++ = srcList[5]; *destList++ = srcList[3]; *destList++ = srcList[1]; }

and all other changes to a certain length.

Jason s · Answer 6 · 2009-11-01T02:14:11+0000

Technically, any series of operations can be performed without branching using a state machine that uses logical operations. The concept of branching is that most programs are a series of instructions executed by a software counter that can go one way or another.

Even if you are talking about a purely functional approach, which is worthless, for a finite set of discrete values you can always (due to large amounts of memory) use the lookup table.

rachvela · Answer 7 · 2009-10-30T18:46:15+0000

For this array you can use this method:

  void tranform(int[] src, int[] dest) { //0, 2, 4, 6, 8, 9, 7, 5, 3, 1 dest[0] = src[0]; dest[1] = src[2]; dest[2] = src[4]; dest[3] = src[6]; dest[4] = src[8]; dest[5] = src[9]; dest[6] = src[7]; dest[7] = src[5]; dest[8] = src[3]; dest[9] = src[1]; }

But in general, for large arrays, it is difficult to write such methods, so it will be useful if you write a generator method as follows:

 static void createFunction(int[] src, int[] dest) { System.out.println("void tranform(int[] src, int[] dest) {"); for (int i = 0; i < dest.length; i++) { for (int j = 0; j < src.length; j++) { if (dest[i] == src[j]) { System.out.println("dest[" + i + "]=src[" + j + "];"); break; } } } System.out.println("}"); }

call it using arrays: createFunction(new int[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, new int[]{0, 2, 4, 6, 8, 9, 7, 5, 3, 1});

And paste the output of this method into your program.

A technique for determining whether it is possible to generate an entire sequence without branches?

More articles: