Efficient List Crossing Algorithm

Question

Efficient List Crossing Algorithm

Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm for finding the intersection of these lists?

+70

list algorithm set-intersection

David Jan 30 '09 at 21:35

source share

15 answers

Frank · Answer 1 · 2009-01-30 21:39

You can put all the elements of the first list in a hash set. Then iterate the second one, and for each of its elements check the hash to see if it exists in the first list. If so, output it as an intersection element.

Aneil Mallavarapu · Answer 2 · 2010-05-23 16:22

You might want to take a look at Bloom filters. These are bit vectors that give a probabilistic answer whether an element is a member of a set. To establish an intersection can be implemented using a simple bitwise operation I. If you have a large number of zero intersections, the Bloom filter can help you quickly eliminate them. However, you still have to resort to one of the other algorithms mentioned here to calculate the actual intersection. http://en.wikipedia.org/wiki/Bloom_filter

Tom Ritter · Answer 3 · 2009-01-30 21:49

without hashing, I suppose you have two options:

The naive way is to compare each element with every other element. O (N ^ 2)
Another way would be to sort the lists first, then iterate over them: O (n lg n) * 2 + 2 * O (n)

zvrba · Answer 4 · 2010-05-23 16:56

From the list of eviews functions, it seems that it supports complex merges and unions (if it is a “join”, as in DB terminology, it will calculate the intersection). Now review your documentation :-)

In addition, eviews has its own user forum - why not ask there _

khaja · Answer 5 · 2011-04-21 02:23

using set 1, build a binary search tree with O(log n) and iterate set2 and search BST m XO(log n) , so that total O(log n) + O(m)+O(log n) ==> O(log n)(m+1)

quasar · Answer 6 · 2010-05-07 04:41

in C ++ you can try the following: STL map

 vector<int> set_intersection(vector<int> s1, vector<int> s2){ vector<int> ret; map<int, bool> store; for(int i=0; i < s1.size(); i++){ store[s1[i]] = true; } for(int i=0; i < s2.size(); i++){ if(store[s2[i]] == true) ret.push_back(s2[i]); } return ret; }

Ayman Farhat · Answer 7 · 2013-01-04 19:55

Here is another possible solution I came across that makes O (nlogn) in time complexity and without any additional storage. You can check it out here https://gist.github.com/4455373

Here's how it works. Assuming the sets do not contain repetitions, merge all the sets into one and sort them. Then skip the combined set and at each iteration create a subset between the current index i and i + n, where n is the number of sets available in the universe. What we are looking for as a loop is a repeating sequence of size n equal to the number of sets in the universe.

If this subset in i is equal to this subset in n, it means that the element at i repeats n times, which is equal to the total number of sets. And since there are no repetitions in any set, which means that each of the sets contains this value, so we add it to the intersection. Then we shift the index by i + what remains between it and n, because definitely not one of these indices is going to form a repeating sequence.

Wookai · Answer 8 · 2009-01-30 21:47

Sort both lists first with quicksort: O (n * log (n). Then compare the lists by looking at the lowest values and add common values. For example, in lua):

 function findIntersection(l1, l2) i, j = 1,1 intersect = {} while i < #l1 and j < #l2 do if l1[i] == l2[i] then i, j = i + 1, j + 1 table.insert(intersect, l1[i]) else if l1[i] > l2[j] then l1, l2 = l2, l1 i, j = j, i else i = i + 1 end end return intersect end

which is O(max(n, m)) , where n and m are the sizes of the lists.

EDIT: quicksort is recursive, as said in the comments, but it seems like a non-recursive implementation

Imran · Answer 9 · 2009-01-30 21:53

Why not implement your own hash table or hash set? Avoid crossing nlogn if your lists are large, as you say.

Since you know a little about your data, you should be able to choose a good hash function.

Andrea Ambu · Answer 10 · 2009-01-30 21:54

If there is support for sets (as you call them in the name) as built-in, usually there is an intersection method.

In any case, as someone said that you can do it easily (I won’t send the code, someone has already done this) if you have sorted lists. If you cannot use recursion, there is no problem. There are options for quick sorting without recursion .

Nosredna · Answer 11 · 2009-01-31 22:55

Secondly, the idea of "sets." In JavaScript, you can use the first list to populate an object using list items as names. Then you use the list items from the second list and see if these properties exist.

StingyJack · Answer 12 · 2009-01-30 21:51

I have good answers from this that you can submit. I don’t have the opportunity to try them yet, but since they also cover intersections, you may find them useful.

score 0 · Answer 13 · 2009-07-16 12:47

In PHP, something like

 function intersect($X) { // X is an array of arrays; returns intersection of all the arrays $counts = Array(); $result = Array(); foreach ($X AS $x) { foreach ($x AS $y) { $counts[$y]++; } } foreach ($counts AS $x => $count) { if ($count == count($X)) { $result[] = $x; } } return $result; }

Adelin · Answer 14 · 2015-10-19 10:08

From the definition of the Big-Oh notation:

T (N) = O (f (N)) if there exist positive constants c and n 0 such that T (N) ≤ cf (N) when N ≥ n 0.

Which in practice means that if two lists are relatively small in size, they say that less than 100 elements in each of the two for loops work just fine. Complete the first list and find a similar object in the second. In my case, this works fine, because I will not have more than 10 - 20 maximum items in my lists. However, a good solution is to sort the first O (n log n), sort the second also O (n log n) and combine them, another O (n log n), roughly speaking O (3 n log n), let's say that the two lists have same size.

Wolf Garbe · Answer 15 · 2016-06-16 14:11

Using skip pointers and SSE instructions can improve the list crossing efficiency.

Efficient List Crossing Algorithm

More articles: