Finding list values in another list using Python

Question

Finding list values in another list using Python

I am trying to find a list sublist. Value if list1 says that [1,5] is in list2, say [1,4,3,5,6], than it should return True. What I'm still like this:

for nums in l1: if nums in l2: return True else: return False

That would be true, but I try to return True only if list1 is in list2 in the appropriate order. Therefore, if list2 is [5,2,3,4,1], it should return False. I was thinking through the lines of comparing the list1 index values using <but I'm not sure.

+3

python list

Al. Nov 08 '09 at 5:20

source share

7 answers

Alex martelli · Answer 1 · 2009-11-08T05:27:51+0000

 try: last_found = -1 for num in L1: last_found = L2.index(num, last_found + 1) return True except ValueError: return False

The index method of list L2 returns the position at which the first argument ( num ) was found in the list; called, as here, with a second argument, he begins to search the list at that position. If index does not find what it is looking for, it throws a ValueError exception.

So, this code uses this approach to search for each element num of L1 , in order, inside L2 . The first time he needs to start looking from position 0; each subsequent point in time, he should begin to look from the position immediately after the last one, where he found the previous element, i.e. last_found + 1 (so in the beginning we should set last_found = -1 to start looking at position 0 for the first time).

If each element in L1 is found in this way (i.e., it was found in L2 after the position where the previous element was found), then the two lists satisfy the given condition, and the code returns True . If any L1 element has never been found, the code catches the resulting ValueError exception and simply returns False .

Another approach would be to use iterators over two lists, which can be formed using the built-in iter function. You can "advance" the iterator by invoking the built-in next on it; this will increase StopIteration if there is no "next element", that is, the iterator is exhausted. You can also use for on an iterator for a smoother interface where applicable. Low level approach using iter / next idea:

 i1 = iter(L1) i2 = iter(L2) while True: try: lookfor = next(i1) except StopIteration: # no more items to look for == all good! return True while True: try: maybe = next(i2) except StopIteration: # item lookfor never matched == nope! return False if maybe == lookfor: break

or, a slightly higher level:

 i1 = iter(L1) i2 = iter(L2) for lookfor in i1: for maybe in i2: if maybe == lookfor: break else: # item lookfor never matched == nope! return False # no more items to look for == all good! return True

In fact, the only key use of iter here is to get i2 - having an inner loop like for maybe in i2 ensures that the inner loop will not start looking from the very beginning every time, but rather, it will keep looking where it last time. The outer loop can also be used for for lookfor in L1: since it does not have a reboot problem.

The key here is an else: loop else: , which starts if and only if the loop was not interrupted by a break , but rather a natural one.

While working on this idea, we were again reminded of the in operator, which could also be continued when it was last left with just an iterator. Great simplification:

 i2 = iter(L2) for lookfor in L1: if lookfor not in i2: return False # no more items to look for == all good! return True

But now we understand that this is exactly what abstracts with the short-circuited any and all built-in functions of the "short circuit", therefore ...:

 i2 = iter(L2) return all(lookfor in i2 for lookfor in L1)

which, I believe, is almost as simple as you can get. The only non-elementary bit left here: you need to explicitly use iter(L2) , only once to make sure that the in operator (internally closed loop) does not restart the search from the very beginning, but rather continues every time from which it expired.

mjv · Answer 2 · 2009-11-08T05:26:20+0000

This question is a bit like homework, so I would like to take the time and discuss what might be wrong with the snippet shown in the question.

Although you use the word in its plural form, for the nums variable you need to understand that Python will use this variable to store the ONE element from l1 at a time and go through the code block in this field for the block, “once for every other element.

Thus, the result of your current fragment will be an exit from the very first iteration, either True or False, if the first elements in the list accidentally fall.

Edit : Yes, A1, just like you said: the logic ends with True after the first iteration. This is due to a "return" when nums is in l2.
If you didn’t do anything in the “found” case, in the loop the logic would continue to finish any logic in the block (not here), and then it would start the next iteration. Therefore, it will only exit with the return value of "False" if the element from l1 was not found l2 (indeed, after the very first such element not found). Therefore, your logic is almost correct (if it didn’t do anything in the “found case”), then one of the missing ones would be to return “True” systematically after the for loop (because if this were not done, the output with a False value inside cycle, then all elements of l2 are in l1 ...).

There are two ways to change the code so that it does not do anything for the “found case”.
- using pass, which is a convenient way to instruct Python to do nothing;
"pass" is usually used when "something", i.e. some action is syntactically required, but we don’t want to do anything, but it can also be used for debugging, etc.
- by rewriting the test instead of not in instead

 if nums not in l2: return False #no else:, ie do nothing at all if found

Now ... Intro to the details.
Your program may have a flaw (with the proposed changes), i.e. It will consider l1 as a sublist of l2, even if l1 says 2 elements with a value of 5, with the result that l2 has only one such value. I am not sure that such a consideration is part of the problem (perhaps the understanding is that both lists are “sets”, without any duplicate elements). However, if duplicates were allowed, you will have to complicate the logic a bit (perhaps to make an intuitive copy of l2, and each time "nums" is in the copy of l2 to remove this element.

Another consideration is that perhaps a list can only be called a sublist if its elements are in the same order as the elements in another list ... Again, it all depends on how the problem is determined ... By the way, some proposed solutions, such as Alex Martelli, are written this way because they solve the problem in such a way that the order of the items with lists matters.

Paul · Answer 3 · 2009-11-08T06:16:27+0000

I think this solution is the fastest because it iterates only once, although in a longer list, and exits before the iteration is completed if a match is found. (Edit: However, this is not as concise or quick as Alex's latest solution)

 def ck(l1,l2): i,j = 0,len(l1) for e in l2: if e == l1[i]: i += 1 if i == j: return True return False

The improvement was proposed by Anurag Uniyal (see comment) and is reflected in the demo below.

Below are some speed results for a range of list size values (List l1 is a list of 10 items containing random values from 1 to 10. List l2 has a range from 10 to 1000 (and also contains random values from 1 to 10).

Code that compares runtime and displays results:

 import random import os import pylab import timeit def paul(l1,l2): i = 0 j = len(l1) try: for e in l2: if e == l1[i]: i += 1 except IndexError: # thanks Anurag return True return False def jed(list1, list2): try: for num in list1: list2 = list2[list2.index(num):] except: return False else: return True def alex(L1,L2): # wow! i2 = iter(L2) return all(lookfor in i2 for lookfor in L1) from itertools import dropwhile from operator import ne from functools import partial def thc4k_andrea(l1, l2): it = iter(l2) try: for e in l1: dropwhile(partial(ne, e), it).next() return True except StopIteration: return False ct = 100 ss = range(10,1000,100) nms = 'paul alex jed thc4k_andrea'.split() ls = dict.fromkeys(nms) for nm in nms: ls[nm] = [] setup = 'import test_sublist as x' for s in ss: l1 = [random.randint(1,10) for i in range(10)] l2 = [random.randint(1,10) for i in range(s)] for nm in nms: stmt = 'x.'+nm+'(%s,%s)'%(str(l1),str(l2)) t = timeit.Timer(setup=setup, stmt=stmt).timeit(ct) ls[nm].append( t ) pylab.clf() for nm in nms: print len(ss), len(ls[nm]) pylab.plot(ss,ls[nm],label=nm) pylab.legend(loc=0) pylab.xlabel('length of l2') pylab.ylabel('time') pylab.savefig('cmp_lsts.png') os.startfile('cmp_lsts.png')

Results:

Andrea Ambu · Answer 4 · 2009-11-08T12:44:37+0000

This should be easy to understand and avoid a corner case, since you don't have to work with indexes:

 def compare(l1, l2): it = iter(l2) for e in l1: try: while it.next() != e: pass except StopIteration: return False return True

he tries to compare each e lement l1 with the next element in l2 .
if there is no next element (StopIteration), it returns false (he visited all l2 and did not find the current e ), otherwise he found it, so it returns true.

For faster execution, this may help to put the try block outside the bounds for:

 def compare(l1, l2): it = iter(l2) try: for e in l1: while it.next() != e: pass except StopIteration: return False return True

Jed smith · Answer 5 · 2009-11-08T05:35:28+0000

I have a feeling that this is more intense than Alex's answer, but here was my first thought:

 def test(list1, list2): try: for num in list1: list2 = list2[list2.index(num):] except: return False else: return True

Edit: Just tried it. Its faster. He is closing.

Edit 2: Moved try / except from the loop (which is why others should look at your code). Thanks gnibbler.

Corey porter · Answer 6 · 2009-11-08T07:05:23+0000

It’s not easy for me to see such questions and not want Python list processing to look more like Haskell's. This seems like a much simpler solution than anything I could offer in Python:

 contains_inorder :: Eq a => [a] -> [a] -> Bool contains_inorder [] _ = True contains_inorder _ [] = False contains_inorder (x:xs) (y:ys) | x == y = contains_inorder xs ys | otherwise = contains_inorder (x:xs) ys

Jochen ritzel · Answer 7 · 2009-11-08T15:58:55+0000

Andrea's ultra optimized version:

 from itertools import dropwhile from operator import ne from functools import partial def compare(l1, l2): it = iter(l2) try: for e in l1: dropwhile(partial(ne, e), it).next() return True except StopIteration: return False

This can be written even more functional style:

 def compare(l1,l2): it = iter(l2) # any( True for .. ) because any([0]) is False, which we don't want here return all( any(True for _ in dropwhile(partial(ne, e), it)) for e in l1 )

Finding list values ​​in another list using Python

More articles:

Finding list values in another list using Python