Find the smallest non-negative integer not in the list in linear time using only lists

Consider the function minout :: [Int] -> Int , which takes a list of individual non-negative integers and returns the smallest non-negative integer not in the list. The behavior of the function, if the input has duplicates, does not matter. Can this be implemented in linear time using only lists (without arrays or vectors or other data structures with efficient random access)?

(It appeared here .)

+5
source share
2 answers

If l has all numbers between 0 and (length l) - 1 inclusive, then minout l is length l , otherwise it is in [0..(length l - 1)] . So, minout l always lies in [0..(length l)] , and only those elements l that are in [0..(length l - 1)] are relevant. The remaining elements can be discarded. Using this idea, we can implement a solution with separate time and linear time. Unlike merge sorting, at each stage of the recursion we recurse only to one of the two subscriptions, each of which is no more than half the size of the original (after doing some linear work). This gives linear time complexity.

 minout :: [Int] -> Int minout = minoutaux 0 where minoutaux :: Int -> [Int] -> Int -- \list base -> smallest integer >= base not occuring in list minoutaux base [] = base minoutaux base [x] = base + (if x==base then 1 else 0) minoutaux base xs = if (length smallpart == n2) then minoutaux (base+n2) bigpart else minoutaux base smallpart where n = (length xs) n2 = n `div` 2 smallpart = [x | x <- xs , base <= x , x < base + n2] bigpart = [x | x <- xs, base + n2 <= x, x < base + n] 

In the above code, minoutaux is a function that defines a β€œbase” integer and a list with various entries, returns the smallest integer that is at least basic and does not appear in the list. To do this, it discards the "unnecessary" elements that can be discarded, as explained earlier, and generates two lists consisting of those numbers that lie in [ base , base + n2 ) (called smallpart ) and [ base + n2 , base + n ) (called bigpart ). Each of these lists will have a length of not more than n2 . If length smallpart == n2 , then smallpart has all the numbers in [ base , base + n2 ), so the answer must be in bigpart , otherwise there is a β€œgap” in smallpart , so the answer lies in smallpart .

Why is this done in linear time? First, the entire list of length N goes through several times, which requires approximately 10N operations, let’s say so. Then minoutaux is called in a smaller list of no more than N / 2. So we have (at most) 10N / 2 more operations. Then 10N / 4, 10N / 8, etc. Adding all this, we obtain the estimate 10 (2N) = 20N. (the constant 10 was used as an example)

Here we smallpart over the list several times to calculate its length, calculate smallpart , calculate bigpart and so on. This could be easily optimized by doing all this in one go. However, this is still a linear workaround, and I wanted to make the code clearer and not optimize on constant factors.

This question and solution is not my original; I met him in class when I recognized Haskell.

+12
source

Richard Bird describes precisely this problem in Chapter 1 of his book, The Pearls of Functional Algorithm Development . (This chapter is just a preview available on Amazon.)

+4
source

All Articles