How to optimize this algorithm

I need help getting this bit of code faster:

UnitBase* Formation::operator[](ushort offset) { UnitBase* unit = 0; if (offset < itsNumFightingUnits) { ushort j = 0; for (ushort i = 0; i < itsNumUnits; ++i) { if (unitSetup[i] == UNIT_ON_FRONT) { if (j == offset) unit = unitFormation[i]; ++j; } } } else throw NotFound(); return unit; } 

So, to give some background, I have this Formation class that contains an array of pointers to UnitBase objects called UnitFormation . The UnitBase* array has an array of numbers of the same size that indicates the state of each corresponding UnitBase object, called UnitSetup .

I overloaded the [] operator to return only pointers to those UnitBase objects that have a certain status, so if I ask itsFormation[5] , the function will not necessarily return UnitFormation[5] , but the 5th element of UnitFormation, which has the status UNIT_ON_FRONT .

I tried using the code above, but according to my profiler, it takes too much time. This makes sense, since the algorithm must read all the elements before returning the requested pointer.

Do I need to completely rethink the whole problem, or can it be done somehow faster?

Thanks in advance.

+4
source share
10 answers

I completely reworked the solution using two vectors: one for the units on the front panel and one for the other blocks, and changed all the algorithms so that the block with the changed status immediately moves from one vector to another. Thus, I eliminated counting in the operator [], which was the main bottleneck.

Before using the profiler, I get the calculation time from 5500 to 7000 ms. After looking at the answers here, 1) I changed the loop variables from ushort to int or uint, which reduced the duration by ~ 10%, 2) I made another modification in the secondary algorithm to reduce the duration by another 30% or so, 3) I implemented two vectors, as explained above. This helped reduce the computation time from ~ 3300 ms to ~ 700 ms, another 40%!

In all this, a reduction of 85 - 90%! Thanks SO and profiler.

Next, I’m going to implement an intermediary template and only if necessary call the update function, possibly by calling a few more ms. :)

New code corresponding to the old fragment (now the functionality is completely different):

 UnitBase* Formation::operator[](ushort offset) { if (offset < numFightingUnits) return unitFormation[offset]->getUnit(); else return NULL; } 

Much shorter and bigger. Of course, there were many other heavy modifications, the most important of which is that unitFormation is now std::vector<UnitFormationElement*> , and not just UnitBase** . UnitFormationElement* contains UnitBase* along with some other vital data that used to hang in the Formation class.

+1
source

One quick optimization would be to return the device as soon as you find it, instead of continuing to iterate over all other devices, for example.

 if (j == offset) unit = unitFormation[i]; 

becomes

 if (j == offset) return unitFormation[i]; 

Of course, this helps only if the unit you are looking for is directed to the front of the unitFormation sequence, but this is trivial to do and sometimes helps.

A more active participation, but a more effective way to do this faster will be for each state to create and maintain a linked list of units that have this status. You will do this in parallel with the main block array, and the contents of linked lists will be a pointer to the array of main blocks, so you will not duplicate the block data. Then, to find the specified offset in the status, you can simply go to the offset th node of the linked list, rather than iterating over each module.

Creating a doubly linked list and storing a pointer to the tail will allow you to find items with high offsets as fast as low offsets (starting from the end and in reverse).

However, it will still be slow if there are many units with the same status, and you are looking for one whose offset is close to the middle.

+7
source

How about redesigning your code to maintain a “units on the front” table, no matter what it sounds, it sounds interesting :-). If this part is really requested a lot and does not change often, then you will save time. Instead of checking all or parts of the complete list of units, you will get the result instantly.

PS: int should use the most natural type for your processor, so using ushorts does not make your program faster .

+4
source

In addition to the other suggestions some of them made, you might want to see if any of these calls to this function are being executed, and eliminate these call points. For example, if you see that you call it repeatedly when there is no chance, the result has changed. The fastest code is one that never runs.

+2
source

Is it possible to sort (or insert sorted) your data by UNIT_ON_FRONT status? This would make the function trivial.

+1
source

How often does unit status change? Perhaps you should keep a list of units that have the appropriate status, and only update this list after changing the status.

If it is necessary to minimize the cost of state changes, you can save an array that indicates how many of the first 256 units have a certain status, how many of the following 256 units, etc. It was possible to scan through the array 256 times as fast as it could scan through units, until there was an Nth “good” unit within 256 slots. Changing the state of a device will only require increasing or decreasing one slot in the array.

Other approaches can be used to balance the cost of changing the status of a unit with the cost of finding units based on different usage patterns.

+1
source

One of the problems may be that this function can be called too often. Assuming the share of UNIT_ON_FRONT is constant, the complexity is linear. However, if you call the operator from a loop, this complexity increases to O (N ^ 2).

If instead you returned something like boost::filter_iterator , you could increase the efficiency of those algorithms that need to be iterated over by UNIT_ON_FRONT.

+1
source

This should not have much impact, but you can check the assembly to see if itsNumFightingUnits and itsNumUnits every iteration of the loop or if they are placed in registers. If they load every time, try adding time series at the beginning of the function.

0
source

For this last juice, and if an exception is thrown regularly, it might be worth switching to returning an error code. This ugly code, but not having stack jumps can be a big help. In game development, exceptions and RTTI are usually disabled.

0
source

You outwitted yourself (which everyone sometimes does). You have done the simple O (N ^ 2) task. Just think about what you need to do before overloading operators.

Posted in response to comment:

Try stepping back to a simpler language, such as C, or a subset of C in C ++. Forget about abstractions, collection classes, and all that. See what your program should do and think about your algorithm. Then, if you can simplify it by using container classes and overloading without making it work anymore, then go. Most performance problems are caused by the adoption of simple problems and the complexity of their use, trying to use all the fancy ideas.

For example, you take the operator [] , which is usually considered O (1), and makes it O (N). Then I assume that you use it in some O (N) loop, so you get O (N ^ 2). What you really want to do is loop over the elements of an array that satisfy a certain condition. You could just do it. If there are very few of them, and you do it at a very high frequency, you can create a separate list of them. But keep your data structure simple, simple, and simple. Better to "waste" cycles and only optimize if you really need to.

0
source

Source: https://habr.com/ru/post/1314583/


All Articles