Quickly generate a "sequence of triangles": avoiding false predictions

I am interested in calculating the sequence of triangles 1 which is a sequence of pairs (i, j): (0, 0), (1, 0), (1, 1), (2, 0), (2, 1) ... which is repeated, although all pairs (i, j) with the restriction i >= j . The same sequence c is interesting, but with the restriction i> j.

These sequences, among other things, are all ways to select 2 (possibly identical) elements from a set of n-elements (for a sequence up to (n, n) 2 ) or indices of the lower triagular elements of matrix 3 . The sequence of values ​​is only for i A003056 in OEIS, while j is A002262 . This sequence often occurs in combinatorial algorithms, where their performance can be critical.

A simple but branched way to generate the following value in a sequence:

 if (i == j) { j = 0; i++; } else { j++; } } 

However, when calculating the initial elements of the sequence when checking the condition (i == j) - this suffers from many incorrect predictions, usually one incorrect prediction every time i increases. As the sequence increases, the number of erroneous forecasts becomes smaller, since i increases with a reduced frequency, therefore, the j++ branch dominates and is well predicted. However, some types of combinatorial searches are repeated many times by small terms in a sequence, so I'm looking for a branchless approach or some other approach that suffers less error errors.

For many applications, the order of the sequences is not so important, so generating differnet values ​​than mentioned above is acceptable if it leads to a better solution. For example, j can calculate, not up: (0, 0), (1, 1), (1, 0), (2, 2), (2, 1), ...


1 I am also interested to know what is the correct name for this sequence (perhaps that’s why I better named this question). I'm just kind of a "triangular sequence."

2 Here version i >= j represents subsets (repetition is allowed), and option i > j represents normal subsets (without repetition).

3 Here version i >= j includes the main diagonal, while version i > j excludes it.

+5
source share
4 answers

The following are two low-cost approaches that do not use costly computing. The comparison and logical AND are used first:

 const bool eq = i == j; i += eq; j = (j + 1) & (eq - 1); 

The second uses comparison and multiplication:

 const bool eq = i == j; i += eq; j = (j + 1) * (1 - eq); 

In theory, the “multiplication” option should be slower than the “logical” one, but the measurements show very little difference.

Both approaches will lead to unallocated code only for processors that allow for branched comparisons (for example, x86). In addition, these approaches require implementation using a language where the results of conditional expressions can be easily converted to integers (for example, C / C ++, where "false" comparisons are converted to zero integers, and "true" ones are integers, equal to "1").

The only problem with these approaches is performance. They could theoretically get ahead of the branched code, but only when erroneous forecasts very often occur. A simple test, in which there is no other work besides generating a "sequence of triangles" (see It on ideone ), shows the miserable speed of incorrect prediction, and therefore both uncommon methods are about 3 times slower than branchy. The explanation is simple: for longer sequences there should not be too many incorrect predictions; as for shorter ones, modern processors have very good industry predictors, which almost never fail in the case of short branching structures; therefore, we don’t have many erroneous forecasts, branched code almost always executes only 2 commands (compare, increment), while non-redistributable code executes both active and forced "branches" plus some instructions specific to a non-network approach.


If you want to repeatedly iterate over the small terms in the sequence , a different approach might be preferable. You calculate a sequence only once, then read it many times from memory.

+5
source

In Python, we can express this as:

 i, j = i + (i == j), (j + 1) * (i != j) 

but it turns out, a million iterations or so on my machine, the following, longer, lazy evaluation code is about 20% faster:

 from itertools import count, repeat def gen_i(): """ A003056 """ for x in count(0): # infinitely counts up yield from repeat(x, x + 1) # replication def gen_j(): """ A002262 """ for x in count(0): # infinitely counts up yield from range(x + 1) # count up to (including) x sequence = zip(gen_i(), gen_j()) for _ in range(1000000): i, j = next(sequence) 

In the above code, gen_i() , gen_j() , count() , repeat() and zip() all generators (and range() is an iterator), so sequence continues to call the code on demand, since new ones are required (i, j ) couples. I am assuming that the implementation of range() and repeat() ends up with incorrect prediction.

Simple is optionally also fast (i.e., consider all unnecessary additions of zero and multiplication by one in compact form.)

So, more importantly, to quickly generate a sequence or to avoid incorrect predictions?

+1
source

It seems so simple that I'm sure I missed a lot ... but aren't you just looking for such a nested loop (pseudocode)

 for(i=0, i<4, i++) for(j=0, j<i+1, j++) print(i,j) 
0
source

You can get j from i:

 ...set val... old_j = j; j = (j + 1) % (i + 1); if (i == old_j) { i++; } ...loop if more... 

And then print i increment from j and current i:

 ...set val... old_j = j; j = (j + 1) % (i + 1); i = i + (i / old_j); ...loop if more... 

(I can not check it at the moment ... Please view)

0
source

All Articles