Effective Rolling Max and Min Window

I want to efficiently calculate the maximum and minimum rolling value. The value is nothing better than recalculating the maximum / minimum value of all used values ​​each time you move the window.

There was a message that asked the same thing, and someone posted a solution that included some kind of approach to the stack that supposedly worked based on its rating. However, I cannot find it again for life.

Any help would be appreciated when looking for a solution or publication. Thanks everyone!

+6
source share
4 answers

The algorithm you want to use is called the ascending minima (implementation in C ++) .

To do this in C #, you'll want to get the double finished queue class, and a good one exists in NuGet under the name Nito.Deque .

I wrote a quick C # implementation using Nito.Deque, but I only checked it briefly and made it out of my head to make it wrong!

public static class AscendingMinima { private struct MinimaValue { public int RemoveIndex { get; set; } public double Value { get; set; } } public static double[] GetMin(this double[] input, int window) { var queue = new Deque<MinimaValue>(); var result = new double[input.Length]; for (int i = 0; i < input.Length; i++) { var val = input[i]; // Note: in Nito.Deque, queue[0] is the front while (queue.Count > 0 && i >= queue[0].RemoveIndex) queue.RemoveFromFront(); while (queue.Count > 0 && queue[queue.Count - 1].Value >= val) queue.RemoveFromBack(); queue.AddToBack(new MinimaValue{RemoveIndex = i + window, Value = val }); result[i] = queue[0].Value; } return result; } } 
+3
source

Here is one way to do this more efficiently. You still have to calculate the value from time to time, but, except for certain degenerate data (ever decreasing values) that are minimized in this solution.

We limit ourselves to the maximum simplification, but it is easy to minimize it.

All you need is the following:

  • The window itself is initially empty.
  • The current maximum (max), initially any value.
  • The account of the current maximum (maxcount), initially zero.

The idea is to use max and maxcount as a cache to hold the current maximum. If the cache is valid, you only need to return the value in it, a very fast operation with constant time.

If the cache is invalid when you request the maximum, it populates the cache and then returns that value. This is slower than the method in the previous paragraph, but subsequent requests for the maximum value after the cache is valid use this faster method again.

Here is what you do to maintain the window and its associated data:

  • Get the next N value.

  • If the window is full, delete the earliest entry M If maxcount is greater than 0 and M is max , decrease maxcount . As soon as maxcount reaches 0, the cache is invalid, but we don’t need to worry about it until the user asks for the maximum value (until there is a point refilling the cache).

  • Add N to the skate window.

  • If the window size is now 1 ( N is the only current record), set max to N and maxcount to 1, and then return to step 1.

  • If maxcount greater than 0 and N greater than max , set max to N and maxcount to 1, and then return to step 1.

  • If maxcount greater than 0 and N is max , increment maxcount .

  • Return to step 1.

Now, in _any_point, when this window control continues, you can request the maximum value. This is a separate operation, different from the control window itself. This can be done using the following rules in sequence.

  • If the window is empty, there is no maximum: raise an exception or return some reasonable sentinel value.

  • If maxcount greater than 0, then the cache is valid: just return max .

  • Otherwise, the cache must be re-populated. Scroll through the list by setting max and maxcount according to the code snippet below.


 set max to window[0], maxcount to 0 for each x in window[]: if x > max: set max to x, maxcount to 1 else: if x == max: increment maxcount 

The fact that you basically maintain the maximum value cache and only recount when necessary makes this a much more efficient solution than just blind recounting whenever a record is added.

For some specific statistics, I created the following Python program. It uses a sliding window of size 25 and uses random numbers from 0 to 999 inclusive (you can play with these properties to see how they affect the result).

Enter the initialization code first. Pay attention to the stat variables, they will be used to calculate caches and misses:

 import random window = [] max = 0 maxcount = 0 maxwin = 25 statCache = 0 statPopul = 0 

Then the function is to add the number to the window as per my description above:

 def addNum(n): global window global max global maxcount if len(window) == maxwin: m = window[0] window = window[1:] if maxcount > 0 and m == n: maxcount = maxcount - 1 window.append(n) if len(window) == 1: max = n maxcount = 1 return if maxcount > 0 and n > max: max = n maxcount = 1 return if maxcount > 0 and n == max: maxcount = maxcount + 1 

Next, the code that returns the maximum value from the window:

 def getMax(): global max global maxcount global statCache global statPopul if len(window) == 0: return None if maxcount > 0: statCache = statCache + 1 return max max = window[0] maxcount = 0 for val in window: if val > max: max = val maxcount = 1 else: if val == max: maxcount = maxcount + 1 statPopul = statPopul + 1 return max 

And finally, the test harness:

 random.seed() for i in range(1000000): val = int(1000 * random.random()) addNum(val) newmax = getMax() #print "Added %d, max is %d, sz is %d"%(val,newMax,len(window)) print "%d cached, %d populated"%(statCache,statPopul) 

Please note that the test harness is trying to get the maximum size each time you add a number to the window. In practice, this may not be necessary. In other words, this is the worst case scenario for random data generated.


By executing this program several times for pseudo-statistical purposes, we get (formatted and analyzed for reporting purposes):

  999959 cached, 41 populated 999889 cached, 111 populate 999979 cached, 21 populated 999951 cached, 49 populated 999966 cached, 34 populated 999925 cached, 75 populated 999980 cached, 20 populated 999905 cached, 95 populated 999958 cached, 42 populated 999983 cached, 17 populated ======= === 9999495 505 

Thus, you can see that on average for random data only about 0.005% (one out of every 20,000) cases led to a hit hit. The vast majority used cached values. This should be significantly better than recalculating the maximum at each insertion into the window.

+3
source

I assume that by “window” you mean a range from a[start] to a[start + len] and that start moves. Consider the minimum value, the maximum value is similar, and go to the window a[start + 1] to a[start + len + 1] . Then the minimum value of the window will change only if (a) a[start + len + 1] < min (a lower value has arrived) or (b) a[start] == min (one of the smallest values ​​is left; recalculate minimum).

Another, perhaps more efficient way to do this is to populate the priority queue in the first window and update with each input / output value, but I don’t think it is much better (priority queues do not fit “select a random item from the middle” (which you you have to do it when you promote the window.) And the code will be much more complicated. Better stick to a simple solution until you make sure that the performance is unacceptable and that this code is responsible for (most of) the resource consumption.

0
source

After I wrote my own album yesterday and asked for improvements, I was transferred here. Indeed, this algo is more elegant. I'm not sure that it offers a constant speed, regardless of the size of the window, but regardless of what, I tested the performance compared to my own caching algorithm (quite simple and probably uses the same idea as the others). caching is 8-15 times faster (tested using rolling windows 5,50,300,1000, I no longer need it). below are both alternatives with stopwatch and checking results.

 static class Program { static Random r = new Random(); static int Window = 50; //(small to facilitate visual functional test). eventually could be 100 1000, but not more than 5000. const int FullDataSize =1000; static double[] InputArr = new double[FullDataSize]; //array prefilled with the random input data. //====================== Caching algo variables static double Low = 0; static int LowLocation = 0; static int CurrentLocation = 0; static double[] Result1 = new double[FullDataSize]; //contains the caching mimimum result static int i1; //incrementor, just to store the result back to the array. In real life, the result is not even stored back to array. //====================== Ascending Minima algo variables static double[] Result2 = new double[FullDataSize]; //contains ascending miminum result. static double[] RollWinArray = new double[Window]; //array for the caching algo static Deque<MinimaValue> RollWinDeque = new Deque<MinimaValue>(); //Niro.Deque nuget. static int i2; //used by the struct of the Deque (not just for result storage) //====================================== my initialy proposed caching algo static void CalcCachingMin(double currentNum) { RollWinArray[CurrentLocation] = currentNum; if (currentNum <= Low) { LowLocation = CurrentLocation; Low = currentNum; } else if (CurrentLocation == LowLocation) ReFindHighest(); CurrentLocation++; if (CurrentLocation == Window) CurrentLocation = 0; //this is faster //CurrentLocation = CurrentLocation % Window; //this is slower, still over 10 fold faster than ascending minima Result1[i1++] = Low; } //full iteration run each time lowest is overwritten. static void ReFindHighest() { Low = RollWinArray[0]; LowLocation = 0; //bug fix. missing from initial version. for (int i = 1; i < Window; i++) if (RollWinArray[i] < Low) { Low = RollWinArray[i]; LowLocation = i; } } //======================================= Ascending Minima algo based on http://stackoverflow.com/a/14823809/2381899 private struct MinimaValue { public int RemoveIndex { get; set; } public double Value { get; set; } } public static void CalcAscendingMinima (double newNum) { //same algo as the extension method below, but used on external arrays, and fed with 1 data point at a time like in the projected real time app. while (RollWinDeque.Count > 0 && i2 >= RollWinDeque[0].RemoveIndex) RollWinDeque.RemoveFromFront(); while (RollWinDeque.Count > 0 && RollWinDeque[RollWinDeque.Count - 1].Value >= newNum) RollWinDeque.RemoveFromBack(); RollWinDeque.AddToBack(new MinimaValue { RemoveIndex = i2 + Window, Value = newNum }); Result2[i2++] = RollWinDeque[0].Value; } public static double[] GetMin(this double[] input, int window) { //this is the initial method extesion for ascending mimima //taken from http://stackoverflow.com/a/14823809/2381899 var queue = new Deque<MinimaValue>(); var result = new double[input.Length]; for (int i = 0; i < input.Length; i++) { var val = input[i]; // Note: in Nito.Deque, queue[0] is the front while (queue.Count > 0 && i >= queue[0].RemoveIndex) queue.RemoveFromFront(); while (queue.Count > 0 && queue[queue.Count - 1].Value >= val) queue.RemoveFromBack(); queue.AddToBack(new MinimaValue { RemoveIndex = i + window, Value = val }); result[i] = queue[0].Value; } return result; } //============================================ Test program. static void Main(string[] args) { //this it the test program. //it runs several attempts of both algos on the same data. for (int j = 0; j < 10; j++) { Low = 12000; for (int i = 0; i < Window; i++) RollWinArray[i] = 10000000; //Fill the data + functional test - generate 100 numbers and check them in as you go: InputArr[0] = 12000; for (int i = 1; i < FullDataSize; i++) //fill the Input array with random data. //InputArr[i] = r.Next(100) + 11000;//simple data. InputArr[i] = InputArr[i - 1] + r.NextDouble() - 0.5; //brownian motion data. Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); for (int i = 0; i < FullDataSize; i++) //run the Caching algo. CalcCachingMin(InputArr[i]); stopwatch.Stop(); Console.WriteLine("Caching : " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds); stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < FullDataSize; i++) //run the Ascending Minima algo CalcAscendingMinima(InputArr[i]); stopwatch.Stop(); Console.WriteLine("AscMimima: " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds); stopwatch.Reset(); i1 = 0; i2 = 0; RollWinDeque.Clear(); } for (int i = 0; i < FullDataSize; i++) //test the results. if (Result2[i] != Result1[i]) //this is a test that algos are valid. Errors (mismatches) are printed. Console.WriteLine("Current:" + InputArr[i].ToString("#.00") + "\tLowest of " + Window + "last is " + Result1[i].ToString("#.00") + " " + Result2[i].ToString("#.00") + "\t" + (Result1[i] == Result2[i])); //for validation purposes only. Console.ReadLine(); } } 
0
source

All Articles