Determine if all items are in the list and in the same order in another list.

Question

Determine if all items are in the list and in the same order in another list.

How to create a sublist() function that takes two lists, list1 and list2 , and returns True if list1 is a sublist of list2 and False otherwise. list1 is a sublayer of list2 if the numbers in list1 appear in list2 in the same order as in list1 , but not necessarily sequentially. For example,

 >>> sublist([1, 12, 3],[25, 1, 30, 12, 3, 40]) True >>> sublist([5, 90, 2],[90, 20, 5, 2, 17]) False

+7

python list order sublist

user3638865 May 15, '14 at 1:19

source share

10 answers

hetman · Answer 1 · 2014-05-15T02:21:43+0000

Here is one way to do this in linear time (and constant space) using an iterator:

 def sublist(a, b): seq = iter(b) try: for x in a: while next(seq) != x: pass else: return True except StopIteration: pass return False

Basically, he goes through each element of the sublist and sees if he can find the same element in the part of the complete list that he has not looked at yet. If this happens through the entire sublist, it means that we have a match (hence the else statement in the for loop). If we run out of items for viewing in the full list, this means that we do not have a match.

Edit: I updated my solution, so it works with Python 3. For Python 2.5 and seq.next() next(seq) needs to be replaced with seq.next() .

huu · Answer 2 · 2014-05-15T01:39:59+0000

Very rude decision:

 def sublist(a, b): if not a: return True for k in range(len(b)): if a[0] == b[k]: return sublist(a[1:], b[k+1:]) return False print sublist([1, 12, 3], [25, 1, 30, 12, 3, 40]) # True print sublist([12, 1, 3], [25, 1, 30, 12, 3, 40]) # False

Edit: speed update

sshashank124 · Answer 3 · 2014-05-15T01:53:54+0000

Here is a simplified version:

 def sublist(a,b): try: return a[0] in b and sublist(a[1:],b[1+b.index(a[0]):]) except IndexError: return True >>> print sublist([1, 12, 3],[25, 1, 30, 12, 3, 40]) True >>> print sublist([5, 90, 2],[90, 20, 5, 2, 17]) False

Alp · Answer 4 · 2014-05-15T01:59:32+0000

Here's an iterative solution that should have optimal asymptotics:

 def sublist(x, y): if x and not y: return False i, lim = 0, len(y) for e in x: while e != y[i]: i += 1 if i == lim: return False i += 1 return True

Decision

@ sshashank124 has the same complexity, but the dynamics will be slightly different: its version crosses the second argument several times, but since it pushes more work to the C layer, it will probably be much faster at a lower input.

Edit : @hetman's solution has essentially the same logic, but much more Pythonic, although, contrary to my expectations, it looks a little slower. (I was also wrong about the performance of @ sshashan124's solution, the overhead of recursive calls outweighs the advantage of doing more work on C.)

Peter DeGlopper · Answer 5 · 2014-05-15T02:08:07+0000

Congratulations on a deceptively difficult question. I think this will work, but I won’t be shocked if I missed the corner case, especially with repeating elements. Revised version inspired by Hgu Nguyen recursive solution:

 def sublist(a, b): index_a = 0 index_b = 0 len_a = len(a) len_b = len(b) while index_a < len_a and index_b < len_b: if a[index_a] == b[index_b]: index_a += 1 index_b += 1 else: index_b += 1 return index_a == len_a

Some rough profiling:

On lists requiring passing most or most of b , my algorithm suffers:

 a = [1, 3, 999999] b = list(range(1000000))

On my PC, the Huu Nguyen or Hetman algorithm takes about 10 seconds to complete 100 iterations of the check. My algorithm takes 20 seconds.

Given earlier success, the Huu algorithm is far behind:

 a = [1, 3, 5]

The hetman algorithm or mine can complete 100 thousand checks per second - the hetman in 0.13 seconds on my PC, mine in 0.19 seconds. Huu takes 16 seconds to complete a 1k check. I am frankly struck by this difference - recursion can be slow if you do not optimize the compiler, I know, but it is 4 orders of magnitude worse than I expected.

Given the list of crashes a , performance returns to what I saw when you need to crawl the entire second list - it’s understandable, since there is no way to find out that at the end there will not be a sequence that matches an otherwise unsupported list.

 a = [3, 1, 5]

Again, about 10 seconds for Huu Nguyen or Getman's algorithm for 100 tests, 20 for mine.

Longer ordered lists support the pattern I saw for early success. EG:

 a = range(0, 1000, 20)

With the Getman algorithm, which took 10.99 seconds to complete 100k tests, while mine took 24.08. Huu took 28.88 to complete 100 tests.

This is admittedly not a complete range of tests that you could perform, but in all cases the Getman algorithm performed the best results.

Adam smith · Answer 6 · 2014-05-15T01:28:45+0000

How about this: allow it the other way around:

 def sublist(a,b): """returns True if a is contained in b and in the same order""" return a == [ch for ch in b if ch in a]

In some cases, this will fail (for example, if [1,2,3] is a subset of [1,1,8,2,3] ), but it’s hard to say exactly how you want it to be implemented ...

John H. · Answer 7 · 2014-05-15T01:36:35+0000

For a quick and dirty solution that runs slowly, but will be fully adequate for the size arrays you showed:

 def sublist(a,b): last = 0 for el_a in a: if el_a in b[last:]: last = b[last:].index(el_a) else: return False return True

** Edited to work with non-contiguous elements

abarnert · Answer 8 · 2015-06-11T22:47:08+0000

Here is another solution that may be understandable for beginners than hetman . (Note that this is very close to the OP implementation in this duplicate question , but avoiding the problem of restarting the search from the beginning of b each time.)

 def sublist(a, b): i = -1 try: for e in a: i = b.index(e, i+1) except ValueError: return False else: return True

Of course, this requires b be list , while Hetman's answer allows any iterable. And I think that (for people who understand Python well), it is less simple than Hetman's answer.

Algorithmically, he does the same as the Getman answer, so the time is O (N) and O (1). But in practice it can be faster, at least in CPython, since we move the inner loop from Python while around the iterator to the C quick-getindex loop (inside list.index ). Again, this may be slower because we are copying this i value instead of having all the state embedded inside the iterator (C-implemented). If that matters, check them as with real data. :)

farhawa · Answer 9 · 2015-06-11T22:43:10+0000

here is the best solution using regex :

 import re def exiSt(short,long): r = ''.join(["(.*"+str[x]+")" for x in short]) return re.match(r,','.join([str(x) for x in long])) == None long = [1, 2, 3, 4, 5] short1 = [1,2,5] short2 = [1,5,3] exiSt(short1,long) >> True exiSt(short2,long) >> False

Trigremm · Answer 10 · 2017-03-09T10:17:10+0000

 def sublist(a, b): "if set_a is not subset of set_b then obvious answer is False" if not set(a).issubset(set(b)): return False n = 0 for i in a: if i in b[n:]: "+1 is needed to skip consecutive duplicates, ie sublist([2,1,1],[2,1]) = False" "if +1 is absent then sublist([2,1,1],[2,1]) = True" "choose to use or not to use +1 according to your needs" n += b[n:].index(i) + 1 else: return False return True

Determine if all items are in the list and in the same order in another list.

More articles: