I implemented two different algorithms that do almost the same thing, check visibility from one node to another in the tree of nodes, with simple rules - a node is visible only to another node if it precedes it along the same branch.
The first method goes down the tree from child to parent, skips the other potential children in the parent to get the tree indices for both nodes and uses some basic logic to determine if there is visibility. I decided to go first, because I already had methods for node indexes that I needed for something else, and I suggested that it would be faster as possible.
bool isVisibleTo(Node * accessor) { QList<uint> accessedI = getIndex(); QList<uint> accessorI = accessor->getIndex(); if (accessedI.size() > accessorI.size()) { return false; } else if (accessedI.size() == accessorI.size()) { for (int i = 0; i < accessedI.size() - 1; ++i) { if (accessedI.at(i) != accessorI.at(i)) { return false; } } if (accessedI.last() > accessorI.last()) { return false; } } for (int i = 0; i < accessorI.size() - (accessorI.size() - accessedI.size()); ++i) { if (accessedI.at(i) > accessorI.at(i)) { return false; } } return true; }
The second traverses the tree completely, each child to the parent, etc., going through significantly more nodes, and I can only assume memory pages and cache lines.
bool isVisibleTo2(Node * accessor) { Node * node = accessor; while (node) { if (node == this) return true; if (node->_parent) { uint i = node->_parent->_children.indexOf(node); while (i) { if (node->_parent->_children.at(--i) == this) { return true; } } } node = node->_parent; } return false; }
I expected this to be a slow algorithm for large trees. But it turned out to be 10-20 times faster for small trees, and as the tree size increased, it stuck 4 times better in recent tests, the final of which took about 20 minutes and included 10 million nodes in the tree (most of the time was allocated node , the actual visibility check was a few seconds).
So what are the performance indicators due to? Given that they give the same results (it is verified that completely - there is no work saved by the second method), and the first method includes fewer memory transitions, and I assume that this is a much more secure cache, and it can just check the depth and do a much shorter assessment? Given this 2 rounds, not one, but they are directly parents to the parents, and they let the rest of the children along the way. And yes, I understand that the second method does not need to go all the way, but still ...
Edit: I switched to -O3 compilation, but the numbers have not changed. I also tried changing the getIndex list to a vector, but it actually caused a significant performance getIndex , since indexes need to be inserted in the reverse order, for example. added, which is very ineffective for the vector.
Edit 2: repeated the quick check with the vector again, this time I refused the prefix and sent in the regular insertion and the reverse operation before returning, this made the vector solution a little faster, 8 times slower than a full crawl of the 6-only method times slower. I suspected that allocating a QList might be the main culprit for poor performance, but there seems to be something else.