Determine if 2 lists have the same items regardless of order?

Sorry for the simple question, but it's hard for me to find the answer.

When I compare 2 lists, I want to know if they are "equal" because they have the same content, but in a different order.

Example:

x = ['a', 'b'] y = ['b', 'a'] 

I want x == y evaluated to True .

+71
python equality list
Jan 15 2018-12-01T00: 00Z
source share
4 answers

You can simply check if the multisets are equal with the elements x and y:

 import collections collections.Counter(x) == collections.Counter(y) 

This requires elements to be hashed; runtime will be in O(n) , where n is the size of the lists.

If the elements are also unique, you can also convert to sets (the same asymptotic execution time, in practice, can be a little faster):

 set(x) == set(y) 

If the elements are not hashable but sortable, another alternative (runtime in O(n log n) ) is

 sorted(x) == sorted(y) 

If the elements are neither hashable nor sortable, you can use the following helper function. Note that it will be rather slow ( O(n²) ) and should usually not be used outside the esoteric case of unpackable and unsortable items.

 def equal_ignore_order(a, b): """ Use only when elements are neither hashable nor sortable! """ unmatched = list(b) for element in a: try: unmatched.remove(element) except ValueError: return False return not unmatched 
+91
Jan 15 '12 at 0:40
source share

Determine if 2 lists have the same elements regardless of order?

The conclusion from your example:

 x = ['a', 'b'] y = ['b', 'a'] 

that the elements of the lists will not be repeated (they are unique), as well as hashable (which lines and other specific python immutable objects), the most direct and computationally efficient Python answer to the built-in sets (which are semantically similar mathematical sets about which you are possibly , learned at school).

 set(x) == set(y) # prefer this if elements are hashable 

In case the items are hashed, but not unique, collections.Counter also works semantically as a multiset, but it is much slower:

 from collections import Counter Counter(x) == Counter(y) 

Prefer to use sorted :

 sorted(x) == sorted(y) 

if items are ordered. This will not take into account unique or faulty circumstances, but it can be much slower than using sets.

Empirical experiment

In an empirical experiment, it is concluded that set should be preferred, then sorted . Use only Counter if you need other things, such as counting or further use as a multiset.

First setup:

 import timeit import random from collections import Counter data = [str(random.randint(0, 100000)) for i in xrange(100)] data2 = data[:] # copy the list into a new one def sets_equal(): return set(data) == set(data2) def counters_equal(): return Counter(data) == Counter(data2) def sorted_lists_equal(): return sorted(data) == sorted(data2) 

And testing:

 >>> min(timeit.repeat(sets_equal)) 13.976069927215576 >>> min(timeit.repeat(counters_equal)) 73.17287588119507 >>> min(timeit.repeat(sorted_lists_equal)) 36.177085876464844 

So, we see that comparing sets is the fastest solution, and comparing sorted lists is the second fastest.

+8
02 Oct '14 at 15:26
source share

This seems to work, although perhaps cumbersome for large lists.

 >>> A = [0, 1] >>> B = [1, 0] >>> C = [0, 2] >>> not sum([not i in A for i in B]) True >>> not sum([not i in A for i in C]) False >>> 

However, if each list should contain all the elements of another, then the above code is problematic.

 >>> A = [0, 1, 2] >>> not sum([not i in A for i in B]) True 

The problem arises when len(A) != len(B) and in this example len(A) > len(B) . To avoid this, you can add another expression.

 >>> not sum([not i in A for i in B]) if len(A) == len(B) else False False 

One more thing, I compared my decision with timeit.repeat, under the same conditions that Aaron Hall used in his post. As expected, the results are disappointing. My method is the last. set(x) == set(y) this.

 >>> def foocomprehend(): return not sum([not i in data for i in data2]) >>> min(timeit.repeat('fooset()', 'from __main__ import fooset, foocount, foocomprehend')) 25.2893661496 >>> min(timeit.repeat('foosort()', 'from __main__ import fooset, foocount, foocomprehend')) 94.3974742993 >>> min(timeit.repeat('foocomprehend()', 'from __main__ import fooset, foocount, foocomprehend')) 187.224562545 
+1
Jan 19 '15 at 2:40
source share

As mentioned in the comments above, the general case is pain. This is pretty easy if all elements are hashed or all elements are sorted. However, I recently had to try to solve a general case. Here is my solution. I realized after the publication that this is a duplicate of the solution above that I missed in the first pass. In any case, if you use slices rather than list.remove (), you can compare immutable sequences.

 def sequences_contain_same_items(a, b): for item in a: try: i = b.index(item) except ValueError: return False b = b[:i] + b[i+1:] return not b 
-one
Dec 12 '15 at 14:23
source share



All Articles