Here is one way to do this more efficiently. You still have to calculate the value from time to time, but, except for certain degenerate data (ever decreasing values) that are minimized in this solution.
We limit ourselves to the maximum simplification, but it is easy to minimize it.
All you need is the following:
- The window itself is initially empty.
- The current maximum (max), initially any value.
- The account of the current maximum (maxcount), initially zero.
The idea is to use max and maxcount as a cache to hold the current maximum. If the cache is valid, you only need to return the value in it, a very fast operation with constant time.
If the cache is invalid when you request the maximum, it populates the cache and then returns that value. This is slower than the method in the previous paragraph, but subsequent requests for the maximum value after the cache is valid use this faster method again.
Here is what you do to maintain the window and its associated data:
Get the next N value.
If the window is full, delete the earliest entry M If maxcount is greater than 0 and M is max , decrease maxcount . As soon as maxcount reaches 0, the cache is invalid, but we don’t need to worry about it until the user asks for the maximum value (until there is a point refilling the cache).
Add N to the skate window.
If the window size is now 1 ( N is the only current record), set max to N and maxcount to 1, and then return to step 1.
If maxcount greater than 0 and N greater than max , set max to N and maxcount to 1, and then return to step 1.
If maxcount greater than 0 and N is max , increment maxcount .
Return to step 1.
Now, in _any_point, when this window control continues, you can request the maximum value. This is a separate operation, different from the control window itself. This can be done using the following rules in sequence.
If the window is empty, there is no maximum: raise an exception or return some reasonable sentinel value.
If maxcount greater than 0, then the cache is valid: just return max .
Otherwise, the cache must be re-populated. Scroll through the list by setting max and maxcount according to the code snippet below.
set max to window[0], maxcount to 0 for each x in window[]: if x > max: set max to x, maxcount to 1 else: if x == max: increment maxcount
The fact that you basically maintain the maximum value cache and only recount when necessary makes this a much more efficient solution than just blind recounting whenever a record is added.
For some specific statistics, I created the following Python program. It uses a sliding window of size 25 and uses random numbers from 0 to 999 inclusive (you can play with these properties to see how they affect the result).
Enter the initialization code first. Pay attention to the stat variables, they will be used to calculate caches and misses:
import random window = [] max = 0 maxcount = 0 maxwin = 25 statCache = 0 statPopul = 0
Then the function is to add the number to the window as per my description above:
def addNum(n): global window global max global maxcount if len(window) == maxwin: m = window[0] window = window[1:] if maxcount > 0 and m == n: maxcount = maxcount - 1 window.append(n) if len(window) == 1: max = n maxcount = 1 return if maxcount > 0 and n > max: max = n maxcount = 1 return if maxcount > 0 and n == max: maxcount = maxcount + 1
Next, the code that returns the maximum value from the window:
def getMax(): global max global maxcount global statCache global statPopul if len(window) == 0: return None if maxcount > 0: statCache = statCache + 1 return max max = window[0] maxcount = 0 for val in window: if val > max: max = val maxcount = 1 else: if val == max: maxcount = maxcount + 1 statPopul = statPopul + 1 return max
And finally, the test harness:
random.seed() for i in range(1000000): val = int(1000 * random.random()) addNum(val) newmax = getMax() #print "Added %d, max is %d, sz is %d"%(val,newMax,len(window)) print "%d cached, %d populated"%(statCache,statPopul)
Please note that the test harness is trying to get the maximum size each time you add a number to the window. In practice, this may not be necessary. In other words, this is the worst case scenario for random data generated.
By executing this program several times for pseudo-statistical purposes, we get (formatted and analyzed for reporting purposes):
999959 cached, 41 populated 999889 cached, 111 populate 999979 cached, 21 populated 999951 cached, 49 populated 999966 cached, 34 populated 999925 cached, 75 populated 999980 cached, 20 populated 999905 cached, 95 populated 999958 cached, 42 populated 999983 cached, 17 populated
Thus, you can see that on average for random data only about 0.005% (one out of every 20,000) cases led to a hit hit. The vast majority used cached values. This should be significantly better than recalculating the maximum at each insertion into the window.