Check if lists have any elements in python

I want to check if any of the elements are in one list in another list. I can do this simply using the code below, but I suspect there might be a library function for this. If not, is there a more pythonic method for achieving the same result.

In [78]: a = [1, 2, 3, 4, 5] In [79]: b = [8, 7, 6] In [80]: c = [8, 7, 6, 5] In [81]: def lists_overlap(a, b): ....: for i in a: ....: if i in b: ....: return True ....: return False ....: In [82]: lists_overlap(a, b) Out[82]: False In [83]: lists_overlap(a, c) Out[83]: True In [84]: def lists_overlap2(a, b): ....: return len(set(a).intersection(set(b))) > 0 ....: 
+88
python list intersection
Jul 03 '10 at 2:15
source share
9 answers

Short answer : use not set(a).isdisjoint(b) , it is usually the fastest.

There are four common ways to check if two lists a and b exchange any elements. The first option is to convert both to a set and check for their intersection, as such:

 bool(set(a) & set(b)) 

Since the collection is stored using a hash table in Python, search for them O(1) (see here for more information on the complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b . But 1) it must first create sets from lists, which can take a small amount of time, and 2) it assumes that hashing conflicts are scarce among your data.

The second way to do this is to use a generator expression that iterates through the lists, for example:

 any(i in a for i in b) 

This allows on-site searches, so no new memory is allocated for intermediate variables. He also helps out at the first find. But the in operator is always O(n) in lists (see here ).

Another proposed option is a hybrid for iterating over one of the list, converting the other into a set and testing for membership in this set, for example:

 a = set(a); any(i in a for i in b) 

The fourth approach is to use the isdisjoint() (frozen) sets method (see here ), for example:

 not set(a).isdisjoint(b) 

If the elements you are looking for are near the beginning of the array (for example, they are sorted), the generator expression is preferable, since the intersection method of the sets should allocate new memory for intermediate variables:

 from timeit import timeit >>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000) 26.077727576019242 >>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000) 0.16220548999262974 

Here's the runtime graph for this example in the list size function:

Shared item sharing test execution time at the beginning

Note that both axes are logarithmic. This is the best example of expression generator. As you can see, the isdisjoint() method is better for very small list sizes, while a generator expression is better for large list sizes.

On the other hand, since the search starts from the beginning for the hybrid and generator expressions, if the shared element is systematically at the end of the array (or both lists do not have any values), intersecting and given intersection approaches are then faster than the generator expression and hybrid approach .

 >>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000)) 13.739536046981812 >>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000)) 0.08102107048034668 

End-to-end sharing test execution time

It is interesting to note that the generator expression is slower for large list sizes. This is only for 1000 repetitions, not 100000 for the previous figure. This setting is also well approximated when no element is shared, and is the best option for non-intersecting and established intersections.

Here are two analyzes using random numbers (instead of setting up the installation in favor of a particular method):

Element sharing test execution time for randomly generated data with a high probability of sharing Element sharing test execution time for randomly generated data with a high probability of sharing

High probability of sharing: elements are randomly taken from [1, 2*len(a)] . Low probability of sharing: items are randomly taken from [1, 1000*len(a)] .

So far, this analysis has assumed that both lists are the same size. In the case of two lists of different sizes, for example a much smaller, isdisjoint() always faster:

Run time for sharing test items in two lists of different sizes when sharing at the beginning Run time for sharing test items in two lists of different sizes when sharing at the end

Make sure list a smaller; otherwise, performance will decrease. In this experiment, the size of the list a was set to 5 .

In short:

  • If the lists are very small (<10 items), not set(a).isdisjoint(b) always the fastest.
  • If the items in the lists are sorted or have a regular structure that you can use with, the any(i in a for i in b) generator expression is the fastest when the list is large;
  • Check the intersection of the set with not set(a).isdisjoint(b) , which is always faster than bool(set(a) & set(b)) .
  • Hybrid "list iteration, set test" a = set(a); any(i in a for i in b) a = set(a); any(i in a for i in b) usually slower than other methods.
  • Generator expression and hybrid are much slower than the other two approaches when it comes to lists without common elements.

In most cases, using the isdisjoint() method is the best approach since the generator expression will take much longer because it is very inefficient when none of the elements are separated.

+209
Jul 18 '13 at 23:06
source share
 def lists_overlap3(a, b): return bool(set(a) & set(b)) 

Note: the above assumes that you want to have a boolean value as an answer. If all you need is an expression to use in an if , just use if set(a) & set(b):

+23
Jul 03 '10 at 2:21
source share
 def lists_overlap(a, b): sb = set(b) return any(el in sb for el in a) 

This is asymptotically optimal (the worst case is O (n + m)) and may be better than the intersection approach due to the short circuit any .

eg:.

 lists_overlap([3,4,5], [1,2,3]) 

will return True as soon as it reaches 3 in sb

EDIT: Another variation (thanks to Dave Kirby):

 def lists_overlap(a, b): sb = set(b) return any(itertools.imap(sb.__contains__, a)) 

It depends on the imap iterator, which is implemented in C, and not for understanding the generator. It also uses sb.__contains__ as a display function. I do not know how noticeable this is. He will still be locked.

+10
Jul 03 '10 at 2:40
source share

You can also use any with a list:

 any([item in a for item in b]) 
+4
Jul 03 '10 at 2:23
source share

In python 2.6 or later you can do:

 return not frozenset(a).isdisjoint(frozenset(b)) 
+3
Nov 13
source share

You can use any built-in / w function to express the generator:

 def list_overlap(a,b): return any(i for i in a if i in b) 

As John and Lee pointed out, this gives incorrect results when for each I separate two lists: bool (i) == False. It should be:

 return any(i in b for i in a) 
+2
Jul 03 '10 at 2:28
source share

This question is quite old, but I noticed that although people argued about sets versus lists, no one thought about using them. Following the lead of Soravux,

Worst case for lists:

 >>> timeit('bool(set(a) & set(b))', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000) 100.91506409645081 >>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000) 19.746716022491455 >>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000) 0.092626094818115234 

And the best option for listings:

 >>> timeit('bool(set(a) & set(b))', setup="a=list(range(10000)); b=list(range(10000))", number=100000) 154.69790101051331 >>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000) 0.082653045654296875 >>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000) 0.08434605598449707 

Thus, even faster than repeating with two lists, iteration is performed, although the list to see if it is in the set, which makes sense, since checking whether the number is in the set takes a constant time, and checking by repeating the list takes time proportional to the length of the list.

So my conclusion is that iterate over the list and check it in the set .

+1
Oct 08 '14 at
source share

if you don't care what the overlapping element may be, you can simply check the len merged list compared to lists merged as a set. If there are overlapping elements, the set will be shorter:

len(set(a+b+c))==len(a+b+c) returns True if there is no overlap.

+1
Mar 10 '15 at 16:11
source share

I will add another functional programming style:

 any(map(lambda x: x in a, b)) 

Explanation:

 map(lambda x: x in a, b) 

returns a list of logical elements in which the elements of b are in a . This list is then passed to any , which simply returns True if any elements are True .

+1
Nov 09 '16 at 18:08
source share



All Articles