Find common items in two sorted lists in linear time

Question

Find common items in two sorted lists in linear time

I have a sorted list of inputs:

let x = [2; 4; 6; 8; 8; 10; 12] let y = [-8; -7; 2; 2; 3; 4; 4; 8; 8; 8;]

I want to write a function that behaves similarly to SQL INNER JOIN. In other words, I want to return the Cartesian product of x and y, which contains only the elements shared in both lists:

 join(x, y) = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]

I wrote a naive version as follows:

 let join xy = [for x' in x do for y' in y do yield (x', y')] |> List.choose (fun (x, y) -> if x = y then Some x else None)

It works, but it works in O(x.length * y.length) . Since both lists are sorted, I think you can get the results that I want in O(min(x.length, y.length)) .

How can I find common items in two sorted lists in linear time?

+6

language-agnostic algorithm f #

Juliet 25 sept. '09 at 20:00

source share

9 answers

I can't help you with F #, but the main idea is to use two indexes, one for each list. Select an item in each list at the current index for that list. If two elements have the same value, add this value to your result set and add both indexes. If the items have different values, add only the index for the list containing the smaller of the two values. Repeat the comparison until one of your lists is empty, and then return the result set.

+8

tvanfosson 25 sept. '09 at 20:06

source share

The following is also tail recursive (as far as I can tell), but the list of results, therefore, reverses:

 let rec merge xs ys acc = match (xs, ys) with | ((x :: xt), (y :: yt)) -> if x = y then let rec count_and_remove_leading zs acc = match zs with | z :: zt when z = x -> count_and_remove_leading zt (acc + 1) | _ -> (acc, zs) let rec replicate_and_prepend zs n = if n = 0 then zs else replicate_and_prepend (x :: zs) (n - 1) let xn, xt = count_and_remove_leading xs 0 let yn, yt = count_and_remove_leading ys 0 merge xt yt (replicate_and_prepend acc (xn * yn)) else if x < y then merge xt ys acc else merge xs yt acc | _ -> acc let xs = [2; 4; 6; 8; 8; 10; 12] let ys = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;] printf "%A" (merge xs ys [])

Output:

[8; 8; 8; 8; 8; 8; 4; 4; 2; 2]

Note that, as sdcvvc says in his answer, this is still O(x.length * y.length) in the worst case, simply because for the edges of two lists of duplicate identical elements, the creation of x.length * y.length values will be required x.length * y.length in the output list, which in itself is an O(m*n) operation.

+2

Pavel minaev 25 sept. '09 at 20:42

source share

I do not know F #, however, I assume that it has arrays and an implementation of binary search through arrays (can also be implemented)

choose the smallest list
copy it to an array (for O (1) random access, if F # already gives you this, you can skip this step)
go through the large list and using binary search, find in the elements of a small array from a large list,
if found, add it to the list of results

Complexity O (min + max * log min), where min = the size of a small list and max - sizeof (large list)

+2

Igor Apr 11 '11 at 9:42

source share

I do not know F #, but I can provide a functional Haskell implementation based on the algorithm described by tvanfosson (hereinafter described by Lasse V. Karlsen).

 import Data.List join :: (Ord a) => [a] -> [a] -> [a] join lr = gjoin (group l) (group r) where gjoin [] _ = [] gjoin _ [] = [] gjoin l@ ( lh@ (x:_):xs) r@ ( rh@ (y:_):ys) | x == y = replicate (length lh * length rh) x ++ gjoin xs ys | x < y = gjoin xs r | otherwise = gjoin l ys main :: IO () main = print $ join [2, 4, 6, 8, 8, 10, 12] [-7, -8, 2, 2, 3, 4, 4, 8, 8, 8]

It [2,2,4,4,8,8,8,8,8,8] . In case you are not familiar with Haskell, some links to documentation:

+1

Stephan202 25 sept. '09 at 20:30

source share

I think this can be done simply using hash tables. Hash tables store the frequencies of elements in each list. They are then used to create a list where the frequency of each element e is the frequency e in X times the frequency e in Y. This has complexity O (n + m).

(EDIT: Just noticed that this might be the worst case of O (n ^ 2), after reading comments on other posts. Something very similar to this has already been posted. Sorry for the duplicate. The code helps.)

I do not know F #, so I am adding Python code. I hope the code is readable enough to easily convert it to F #.

 def join(x,y): x_count=dict() y_count=dict() for elem in x: x_count[elem]=x_count.get(elem,0)+1 for elem in y: y_count[elem]=y_count.get(elem,0)+1 answer=[] for elem in x_count: if elem in y_count: answer.extend( [elem]*(x_count[elem]*y_count[elem] ) ) return answer A=[2, 4, 6, 8, 8, 10, 12] B=[-8, -7, 2, 2, 3, 4, 4, 8, 8, 8] print join(A,B)

+1

MAK Sep 26 '09 at 9:53

source share

The problem with what he wants is that he obviously has to iterate over the list.

To get 8,8,8 to be displayed twice, the function should iterate over the second list a bit. Worst case scenario (two identical lists) will still give O (x * y)

As a side note, this does not use external functions that loops on their own.

 for (int i = 0; i < shorterList.Length; i++) { if (shorterList[i] > longerList[longerList.Length - 1]) break; for (int j = i; j < longerList.Length && longerList[j] <= shorterList[i]; j++) { if (shorterList[i] == longerList[j]) retList.Add(shorterList[i]); } }

0

Justoren 25 sept. '09 at 20:33

source share

I think this is O (n) in the intersect / join code, although the complete thing goes through each list twice:

 // list unique elements and their multiplicity (also reverses sorting) // eg pack y = [(8, 3); (4, 2); (3, 1); (2, 2); (-8, 1); (-7, 1)] // we assume xs is ordered let pack xs = Seq.fold (fun acc x -> match acc with | (y,ny) :: tl -> if y=x then (x,ny+1) :: tl else (x,1) :: acc | [] -> [(x,1)]) [] xs let unpack px = [ for (x,nx) in px do for i in 1 .. nx do yield x ] // for lists of (x,nx) and (y,ny), returns list of (x,nx*ny) when x=y // assumes inputs are sorted descending (from pack function) // and returns results sorted ascending let intersect_mult xs ys = let rec aux rx ry acc = match (rx,ry) with | (x,nx)::xtl, (y,ny)::ytl -> if x = y then aux xtl ytl ((x,nx*ny) :: acc) elif x < y then aux rx ytl acc else aux xtl ry acc | _,_ -> acc aux xs ys [] let inner_join xy = intersect_mult (pack x) (pack y) |> unpack

Now we test it on your sample data.

 let x = [2; 4; 6; 8; 8; 10; 12] let y = [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;] > inner_join xy;; val it : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]

EDIT: I just realized that this is the same idea as the previous sdcvvc answer (after editing).

0

Gabriel Sep 26 '09 at 7:40

source share

You cannot get O (min (x.length, y.length)) because the output may be larger. Suppose all elements x and y are equal, for example. Then the output size is a product of the size of x and y, which gives a lower estimate of the efficiency of the algorithm.

Here's the algorithm in F #. This is not tail recursion that can be easily fixed. The trick performs mutual recursion. Also note that I can invert the list order specified in prod to avoid unnecessary work.

 let rec prod xs ys = match xs with | [] -> [] | z :: zs -> reps xs ys ys and reps xs ys zs = match zs with | [] -> [] | w :: ws -> if xs.Head = w then w :: reps xs ys ws else if xs.Head > w then reps xs ys ws else match ys with | [] -> [] | y :: yss -> if y < xs.Head then prod ys xs.Tail else prod xs.Tail ys

The original algorithm in Scala:

 def prod(x: List[Int], y: List[Int]): List[Int] = x match { case Nil => Nil case z :: zs => reps(x, y, y) } def reps(x: List[Int], y: List[Int], z: List[Int]): List[Int] = z match { case w :: ws if x.head == w => w :: reps(x, y, ws) case w :: ws if x.head > w => reps(x, y, ws) case _ => y match { case Nil => Nil case y1 :: ys if y1 < x.head => prod(y, x.tail) case _ => prod(x.tail, y) } }

0

Daniel C. Sobral Sep 26 '09 at 14:01

source share

sdcvvc · Accepted Answer · 2009-09-25T20:27:57+0000

O (min (n, m)) is impossible: take two lists [x; x; ...; x; y] and [x; x; ...; x; z]. You must look at both lists to the end to compare y and z.

Even O (n + m) is impossible. accept [1,1, ..., 1] - n times and also [1,1, ..., 1] - m times Then the resulting list should have n * m elements. You need at least O (nm) (correctly Omega (nm)) to create such a list.

Without a Cartesian product (a simple merger) it is quite simple. Ocaml code (I don't know, F #, should be close enough, compiled, but not tested):

 let rec merge ab = match (a,b) with ([], xs) -> xs | (xs, []) -> xs | (x::xs, y::ys) -> if x <= y then x::(merge xs (y::ys)) else y::(merge (x::xs) (y::ys));;

(Edit: I'm late)

So your code in O (nm) is the best in the worst case. However, IIUIC always performs n * m operations, which is not optimal.

My approach will be

1) write a function

group: 'list → (' a * int) list

which counts the number of identical elements:

group [1,1,1,1,1,2,2,3] == [(1,5); (2.2); (3.1)]

2) use it to combine both lists using the same code as before (there you can multiply these coefficients)

3) write a function

ungroup: ('a * int) list →' list

and make these three.

This has complexity O (n + m + x), where x is the length of the resulting list. This is the maximum possible value until constant.

Edit: here you are:

 let group x = let rec group2 lm = match l with | [] -> [] | a1::a2::r when a1 == a2 -> group2 (a2::r) (m+1) | x::r -> (x, m+1)::(group2 r 0) in group2 x 0;; let rec merge ab = match (a,b) with ([], xs) -> [] | (xs, []) -> [] | ((x, xm)::xs, (y, ym)::ys) -> if x == y then (x, xm*ym)::(merge xs ys) else if x < y then merge xs ((y, ym)::ys) else merge ((x, xm)::xs) ys;; let rec ungroup a = match a with [] -> [] | (x, 0)::l -> ungroup l | (x, m)::l -> x::(ungroup ((x,m-1)::l));; let crossjoin xy = ungroup (merge (group x) (group y));; # crossjoin [2; 4; 6; 8; 8; 10; 12] [-7; -8; 2; 2; 3; 4; 4; 8; 8; 8;];; - : int list = [2; 2; 4; 4; 8; 8; 8; 8; 8; 8]

Find common items in two sorted lists in linear time

More articles: