The “logically slow” algorithm is faster, but why?

Question

The “logically slow” algorithm is faster, but why?

I implemented two different algorithms that do almost the same thing, check visibility from one node to another in the tree of nodes, with simple rules - a node is visible only to another node if it precedes it along the same branch.

The first method goes down the tree from child to parent, skips the other potential children in the parent to get the tree indices for both nodes and uses some basic logic to determine if there is visibility. I decided to go first, because I already had methods for node indexes that I needed for something else, and I suggested that it would be faster as possible.

bool isVisibleTo(Node * accessor) { QList<uint> accessedI = getIndex(); QList<uint> accessorI = accessor->getIndex(); if (accessedI.size() > accessorI.size()) { return false; } else if (accessedI.size() == accessorI.size()) { for (int i = 0; i < accessedI.size() - 1; ++i) { if (accessedI.at(i) != accessorI.at(i)) { return false; } } if (accessedI.last() > accessorI.last()) { return false; } } for (int i = 0; i < accessorI.size() - (accessorI.size() - accessedI.size()); ++i) { if (accessedI.at(i) > accessorI.at(i)) { return false; } } return true; }

The second traverses the tree completely, each child to the parent, etc., going through significantly more nodes, and I can only assume memory pages and cache lines.

 bool isVisibleTo2(Node * accessor) { Node * node = accessor; while (node) { if (node == this) return true; if (node->_parent) { uint i = node->_parent->_children.indexOf(node); while (i) { if (node->_parent->_children.at(--i) == this) { return true; } } } node = node->_parent; } return false; }

I expected this to be a slow algorithm for large trees. But it turned out to be 10-20 times faster for small trees, and as the tree size increased, it stuck 4 times better in recent tests, the final of which took about 20 minutes and included 10 million nodes in the tree (most of the time was allocated node , the actual visibility check was a few seconds).

So what are the performance indicators due to? Given that they give the same results (it is verified that completely - there is no work saved by the second method), and the first method includes fewer memory transitions, and I assume that this is a much more secure cache, and it can just check the depth and do a much shorter assessment? Given this 2 rounds, not one, but they are directly parents to the parents, and they let the rest of the children along the way. And yes, I understand that the second method does not need to go all the way, but still ...

Edit: I switched to -O3 compilation, but the numbers have not changed. I also tried changing the getIndex list to a vector, but it actually caused a significant performance getIndex , since indexes need to be inserted in the reverse order, for example. added, which is very ineffective for the vector.

Edit 2: repeated the quick check with the vector again, this time I refused the prefix and sent in the regular insertion and the reverse operation before returning, this made the vector solution a little faster, 8 times slower than a full crawl of the 6-only method times slower. I suspected that allocating a QList might be the main culprit for poor performance, but there seems to be something else.

+7

c ++ performance visibility algorithm tree

dtech Nov 21 '13 at 14:16

source share

4 answers

klm123 · Answer 1 · 2013-11-21T15:21:16+0000

If you correctly understood that the getIndex() function, which you call in the first case, performs basically the same walk through the whole tree, which isVisibleTo2() does. But isVisibleTo1() has additional getIndex operations, so it is slower.

Andyg · Answer 2 · 2013-11-21T14:49:13+0000

One of the big flaws of the first method is that it seems to allocate space of up to 2n nodes every time it is called (in your QLists ), which probably destroys your cache and eats up your heap.

My big concern: are you 100% sure that the functions work the same? I know that you ran tests with a large number of nodes, but you really check that the functions return the same answer for different trees.

Glenn teitelbaum · Answer 3 · 2013-11-21T15:24:19+0000

Additional information on real structure methods may change this, but:

another possible difference here is branch predictability

It seems that the second version may be more predictable than the first

Tim Lovell-Smith · Answer 4 · 2013-12-13T16:02:48+0000

It is easy. The problem with the first version is that you call getIndex () twice, which allocates memory. You can prove it or disprove it by posting your code for getIndex ().

There is nothing logically slower that you are not calling an expensive function, by the way.

The “logically slow” algorithm is faster, but why?

More articles: