Search for a unique (as soon as happens once) haskell element

Question

Search for a unique (as soon as happens once) haskell element

I need a function that takes a list and returns a unique element if it exists, or [] if it is not. If there are many unique elements, he should return the first (without wasting time looking for others). In addition, I know that all the elements in the list come from the (small and well-known) set A. For example, this function does the job for Ints:

unique :: Ord a => [a] -> [a] unique li = first $ filter ((==1).length) ((group.sort) li) where first [] = [] first (x:xs) = x ghci> unique [3,5,6,8,3,9,3,5,6,9,3,5,6,9,1,5,6,8,9,5,6,8,9] ghci> [1]

This, however, is not good enough because it involves sorting (n log n), while it can be done in linear time (since A is small). In addition, it requires the list item type to be Ord, while all that should be needed is Eq. It would also be nice if the number of comparisons was as small as possible (i.e., if we cross the list and encounter the el element twice, we will not test the subsequent elements for equality with el)

Here's why, for example, this: Counting unique items in a list does not solve the problem - all answers include either sorting or moving the entire list to find the count of all items.

Question: how to do it correctly and efficiently in Haskell?

+8

algorithm functional-programming haskell

Piotr lopusiewicz Apr 17 '13 at 10:34

source share

6 answers

In fact, there is no way to do this effectively only with Eq . You will need to use a much less efficient way to create groups of equal items, and you may not know that there is only one of a specific item without scanning the entire list.

Also, note that to avoid useless comparisons, you will need a way to check if an element has been previously detected, and the only way to do this is to have a list of elements that are known to have multiple occurrences, and the only way to check if the current element is located on this list ... compare it with equality with everyone.

If you want this to work faster than O (something really awful), you need an Ord constraint.

Well, based on the explanation in the comments, here is a quick and dirty example of what I think you're looking for:

 unique [] _ _ = Nothing unique _ [] [] = Nothing unique _ (r:_) [] = Just r unique candidates results (x:xs) | x `notElem` candidates = unique candidates results xs | x `elem` results = unique (delete x candidates) (delete x results) xs | otherwise = unique candidates (x:results) xs

The first argument is a list of candidates, initially representing all the possible elements. The second argument is a list of possible results that should initially be empty. The third argument is a checklist.

If he runs out of candidates or does not reach the end of the list with no results, he returns Nothing . If it reaches the end of the list with results, it returns the value indicated at the beginning of the list of results.

Otherwise, it checks the following input element: if it is not a candidate, it ignores it and continues. If it is in the list of results, we have seen it twice, so remove it from the lists of results and candidates and continue. Otherwise, add it to the results and continue.

Unfortunately, it is still necessary to scan the entire list even for one result, as this is the only way to make sure that it is truly unique.

+6

CA McCann Apr 17 '13 at 10:42

source share

First of all, if your function is designed to return no more than one element, you should almost certainly use Maybe a instead of [a] to return the result.

Secondly, at least you have no choice but to go through the whole list: you cannot say for sure whether any given element is really unique until you look at all the others.

If your elements are not Ord ered, but can only be tested for Eq uality, you really have no better option than something like:

 firstUnique (x:xs) | elem x xs = firstUnique (filter (/= x) xs) | otherwise = Just x firstUnique [] = Nothing

Please note that you do not need to filter out duplicate elements if you do not want this - the worst case is quadratic in any case.

Edit:

The aforementioned drawback of early exit due to the aforementioned small / known set of possible elements. However, note that in the worst case scenario, you still need to go through the entire list: all that is needed is at least one of these possible items in the list ...

However, an implementation that provides an early statement in case of fatigue:

 firstUnique = f [] [<small/known set of possible elements>] where f [] [] _ = Nothing -- early out f uniques noshows (x:xs) | elem x uniques = f (delete x uniques) noshows xs | elem x noshows = f (x:uniques) (delete x noshows) xs | otherwise = f uniques noshows xs f [] _ [] = Nothing f (u:_) _ [] = Just u

Please note that if there are elements in your list that should not be (because they are not in the small / well-known set), they will be ignored by the code above ...

+2

comingstorm Apr 17 '13 at 10:58

source share

As others have already said, without any additional restrictions you cannot do this in less than quadratic time, because, unaware of the elements, you cannot store them in some reasonable data structure.

If we can compare the elements, then the obvious solution is O (n log n) to calculate the number of elements first, and then find the first with a score of 1:

 import Data.List (foldl', find) import Data.Map (Map) import qualified Data.Map as Map import Data.Maybe (fromMaybe) count :: (Ord a) => Map a Int -> a -> Int count mx = fromMaybe 0 $ Map.lookup xm add :: (Ord a) => Map a Int -> a -> Map a Int add mx = Map.insertWith (+) x 1 m uniq :: (Ord a) => [a] -> Maybe a uniq xs = find (\x -> count cs x == 1) xs where cs = foldl' add Map.empty xs

Please note that the coefficient log n is based on the fact that we need to work with a Map size n. If there are only k unique elements in the list, the size of our map will be no more than k, so the total complexity will be only O (n log k).

However, we can do even better - we can use a hash table instead of a map to get an O (n) solution . To do this, we need the ST monad to perform volatile operations on the hash map, and our elements must be Hashable . The solution is basically the same as before, a bit more complicated due to the work in the ST monad:

 import Control.Monad import Control.Monad.ST import Data.Hashable import qualified Data.HashTable.ST.Basic as HT import Data.Maybe (fromMaybe) count :: (Eq a, Hashable a) => HT.HashTable sa Int -> a -> ST s Int count ht x = liftM (fromMaybe 0) (HT.lookup ht x) add :: (Eq a, Hashable a) => HT.HashTable sa Int -> a -> ST s () add ht x = count ht x >>= HT.insert ht x . (+ 1) uniq :: (Eq a, Hashable a) => [a] -> Maybe a uniq xs = runST $ do -- Count all elements into a hash table: ht <- HT.newSized (length xs) forM_ xs (add ht) -- Find the first one with count 1 first (\x -> liftM (== 1) (count ht x)) xs -- Monadic variant of find which exists once an element is found. first :: (Monad m) => (a -> m Bool) -> [a] -> m (Maybe a) first p = f where f [] = return Nothing f (x:xs') = do b <- px if b then return (Just x) else f xs'

Notes:

If you know that there will only be a small number of individual items in the list, you can use HT.new instead of HT.newSized (length xs) . This will save you some memory and one pass over xs , but in the case of many separate elements, the hash table will change several times.

+2

Petr pudlák Apr 18 '13 at 7:52

source share

Here is the version that does the trick:

 unique :: Eq a => [a] -> [a] unique = select . collect [] where collect acc [] = acc collect acc (x : xs) = collect (insert x acc) xs insert x [] = [[x]] insert x (ys@(y : _) : yss) | x == y = (x : ys) : yss | otherwise = ys : insert x yss select [] = [] select ([x] : _) = [x] select ((_ : _) : xss) = select xss

So, first we move the input list ( collect ), while maintaining a list of buckets of equal elements that we update with insert . Then we simply select the first item that appears in the singleton bucket ( select ).

The bad news is that it takes quadratic time: for each visited item in collect we need to go through the list of buckets. I am afraid that this is the price you will have to pay for the fact that it may limit the type of item that is in Eq .

+1

Stefan holdermans Apr 18 '13 at 3:10

source share

Something like this looks pretty good.

 unique = fst . foldl' (\(a, b) c -> if (c `elem` b) then (a, b) else if (c `elem` a) then (delete ca, c:b) else (c:a, b)) ([],[])

The first element of the resulting fold tuple contains what you expect, a list containing a unique element. The second element of the tuple is the process memory, which is remembered if the element is already discarded or not.

About space performance.
Since your design issue is, the entire list item must go through at least once before the result can be displayed. And the internal algorithm should contain traces of the discarded value in addition to the good one, but the discarded value will be displayed only once. Then in the worst case, the required amount of memory is equal to the size of the entered list. These sound products, as you said, are expected.

About the performance of time.
Since the expected input is small and not sorted by default, trying to sort the list into an algorithm is useless, or it is useless to apply it earlier. In fact, statically we can almost say that the additional operation of placing an element in its ordered place (in the subcategory a and b tuple (a,b) ) will cost the same amount of time as to check whether this element is displayed in the list or not .

Below is a nicer and more explicit version of foldl 'one.

 import Data.List (foldl', delete, elem) unique :: Eq a => [a] -> [a] unique = fst . foldl' algorithm ([], []) where algorithm (result0, memory0) current = if (current `elem` memory0) then (result0, memory0) else if (current`elem` result0) then (delete current result0, memory) else (result, memory0) where result = current : result0 memory = current : memory0

In the if ... then ... else ... nested command, the result list goes twice in the worst case, this can be avoided by using the following helper function.

 unique' :: Eq a => [a] -> [a] unique' = fst . foldl' algorithm ([], []) where algorithm (result, memory) current = if (current `elem` memory) then (result, memory) else helper current result memory [] where helper current [] [] acc = ([current], []) helper current [] memory acc = (acc, memory) helper current (r:rs) memory acc | current == r = (acc ++ rs, current:memory) | otherwise = helper current rs memory (r:acc)

But the helper can be rewritten using fold as follows, which is definitely better.

 helper current [] _ = ([current],[]) helper current memory result = foldl' (\(r, m) x -> if x==current then (r, current:m) else (current:r, m)) ([], memory) $ result

0

zurgl Apr 17 '13 at 10:58

source share

luqui · Accepted Answer · 2013-04-18T08:37:31+0000

Okay, linear time, from a finite area. The run time will be O ((m + d) log d), where m is the size of the list and d is the size of the domain, which is linear when d is fixed. My plan is to use set elements as trie keys, with counts as values, and then look at trie for elements with count 1.

 import qualified Data.IntTrie as IntTrie import Data.List (foldl') import Control.Applicative

Count each of the elements. This traverses the list once, builds a trie with the results (O (m log d)), and then returns a function that looks for the result in trie (with the runtime O (log d)).

 counts :: (Enum a) => [a] -> (a -> Int) counts xs = IntTrie.apply (foldl' insert (pure 0) xs) . fromEnum where insert tx = IntTrie.modify' (fromEnum x) (+1) t

We use the Enum constraint to convert values of type a to integers to index them in trie. The Enum instance is part of the evidence of your assumption that a is a small finite set ( Bounded will be the other part, but see below).

And then find those that are unique.

 uniques :: (Eq a, Enum a) => [a] -> [a] -> [a] uniques dom xs = filter (\x -> cts x == 1) dom where cts = counts xs

This function takes as its first parameter an enumeration of the entire domain. We could require a Bounded a constraint and use [minBound..maxBound] instead, which is semantically attractive to me, since the final one is essentially Enum + Bounded , but rather inflexible, because now the domain must be known at compile time. So I would choose this slightly ugly but more flexible option.

uniques runs once once (lazy, so head . uniques dom will only move until it needs to find the first unique element - not in the list, but in dom ), for each element that is executed, the search function that we set is O (log d), so the filter takes O (d log d), and building the sample table takes O (m log d). So uniques works in O ((m + d) log d), which is linear when d is fixed. A minimum of & Omega; (m log d) to get any information from him, because he must cross the entire list in order to build the table (you have to go all the way to the end of the list to see if the element has been repeated, so you cannot do better than that).

Search for a unique (as soon as happens once) haskell element

More articles: