Determine if all items are in the list and in the same order in another list.

How to create a sublist() function that takes two lists, list1 and list2 , and returns True if list1 is a sublist of list2 and False otherwise. list1 is a sublayer of list2 if the numbers in list1 appear in list2 in the same order as in list1 , but not necessarily sequentially. For example,

 >>> sublist([1, 12, 3],[25, 1, 30, 12, 3, 40]) True >>> sublist([5, 90, 2],[90, 20, 5, 2, 17]) False 
+7
python list order sublist
source share
10 answers

Here is one way to do this in linear time (and constant space) using an iterator:

 def sublist(a, b): seq = iter(b) try: for x in a: while next(seq) != x: pass else: return True except StopIteration: pass return False 

Basically, he goes through each element of the sublist and sees if he can find the same element in the part of the complete list that he has not looked at yet. If this happens through the entire sublist, it means that we have a match (hence the else statement in the for loop). If we run out of items for viewing in the full list, this means that we do not have a match.

Edit: I updated my solution, so it works with Python 3. For Python 2.5 and seq.next() next(seq) needs to be replaced with seq.next() .

+7
source share

Very rude decision:

 def sublist(a, b): if not a: return True for k in range(len(b)): if a[0] == b[k]: return sublist(a[1:], b[k+1:]) return False print sublist([1, 12, 3], [25, 1, 30, 12, 3, 40]) # True print sublist([12, 1, 3], [25, 1, 30, 12, 3, 40]) # False 

Edit: speed update

+5
source share

Here is a simplified version:

 def sublist(a,b): try: return a[0] in b and sublist(a[1:],b[1+b.index(a[0]):]) except IndexError: return True >>> print sublist([1, 12, 3],[25, 1, 30, 12, 3, 40]) True >>> print sublist([5, 90, 2],[90, 20, 5, 2, 17]) False 
+2
source share

Here's an iterative solution that should have optimal asymptotics:

 def sublist(x, y): if x and not y: return False i, lim = 0, len(y) for e in x: while e != y[i]: i += 1 if i == lim: return False i += 1 return True 
Decision

@ sshashank124 has the same complexity, but the dynamics will be slightly different: its version crosses the second argument several times, but since it pushes more work to the C layer, it will probably be much faster at a lower input.

Edit : @hetman's solution has essentially the same logic, but much more Pythonic, although, contrary to my expectations, it looks a little slower. (I was also wrong about the performance of @ sshashan124's solution, the overhead of recursive calls outweighs the advantage of doing more work on C.)

+2
source share

Congratulations on a deceptively difficult question. I think this will work, but I won’t be shocked if I missed the corner case, especially with repeating elements. Revised version inspired by Hgu Nguyen recursive solution:

 def sublist(a, b): index_a = 0 index_b = 0 len_a = len(a) len_b = len(b) while index_a < len_a and index_b < len_b: if a[index_a] == b[index_b]: index_a += 1 index_b += 1 else: index_b += 1 return index_a == len_a 

Some rough profiling:

On lists requiring passing most or most of b , my algorithm suffers:

 a = [1, 3, 999999] b = list(range(1000000)) 

On my PC, the Huu Nguyen or Hetman algorithm takes about 10 seconds to complete 100 iterations of the check. My algorithm takes 20 seconds.

Given earlier success, the Huu algorithm is far behind:

 a = [1, 3, 5] 

The hetman algorithm or mine can complete 100 thousand checks per second - the hetman in 0.13 seconds on my PC, mine in 0.19 seconds. Huu takes 16 seconds to complete a 1k check. I am frankly struck by this difference - recursion can be slow if you do not optimize the compiler, I know, but it is 4 orders of magnitude worse than I expected.

Given the list of crashes a , performance returns to what I saw when you need to crawl the entire second list - it’s understandable, since there is no way to find out that at the end there will not be a sequence that matches an otherwise unsupported list.

 a = [3, 1, 5] 

Again, about 10 seconds for Huu Nguyen or Getman's algorithm for 100 tests, 20 for mine.

Longer ordered lists support the pattern I saw for early success. EG:

 a = range(0, 1000, 20) 

With the Getman algorithm, which took 10.99 seconds to complete 100k tests, while mine took 24.08. Huu took 28.88 to complete 100 tests.

This is admittedly not a complete range of tests that you could perform, but in all cases the Getman algorithm performed the best results.

+2
source share

How about this: allow it the other way around:

 def sublist(a,b): """returns True if a is contained in b and in the same order""" return a == [ch for ch in b if ch in a] 

In some cases, this will fail (for example, if [1,2,3] is a subset of [1,1,8,2,3] ), but it’s hard to say exactly how you want it to be implemented ...

+1
source share

For a quick and dirty solution that runs slowly, but will be fully adequate for the size arrays you showed:

 def sublist(a,b): last = 0 for el_a in a: if el_a in b[last:]: last = b[last:].index(el_a) else: return False return True 

** Edited to work with non-contiguous elements

+1
source share

Here is another solution that may be understandable for beginners than hetman . (Note that this is very close to the OP implementation in this duplicate question , but avoiding the problem of restarting the search from the beginning of b each time.)

 def sublist(a, b): i = -1 try: for e in a: i = b.index(e, i+1) except ValueError: return False else: return True 

Of course, this requires b be list , while Hetman's answer allows any iterable. And I think that (for people who understand Python well), it is less simple than Hetman's answer.

Algorithmically, he does the same as the Getman answer, so the time is O (N) and O (1). But in practice it can be faster, at least in CPython, since we move the inner loop from Python while around the iterator to the C quick-getindex loop (inside list.index ). Again, this may be slower because we are copying this i value instead of having all the state embedded inside the iterator (C-implemented). If that matters, check them as with real data. :)

+1
source share

here is the best solution using regex :

 import re def exiSt(short,long): r = ''.join(["(.*"+str[x]+")" for x in short]) return re.match(r,','.join([str(x) for x in long])) == None long = [1, 2, 3, 4, 5] short1 = [1,2,5] short2 = [1,5,3] exiSt(short1,long) >> True exiSt(short2,long) >> False 
0
source share
 def sublist(a, b): "if set_a is not subset of set_b then obvious answer is False" if not set(a).issubset(set(b)): return False n = 0 for i in a: if i in b[n:]: "+1 is needed to skip consecutive duplicates, ie sublist([2,1,1],[2,1]) = False" "if +1 is absent then sublist([2,1,1],[2,1]) = True" "choose to use or not to use +1 according to your needs" n += b[n:].index(i) + 1 else: return False return True 
0
source share

All Articles