The most efficient way to loop find elements not in a list in python

I am trying to improve the performance of a script that takes lists and counts how many items are not in another "master" ( list_of_all_items) list .

It seems like there might be a more efficient way to do this, perhaps by combining the request in some way?

purple_count, brown_count, blue_count = 0, 0, 0

for item in list_of_purple_items:
    if item not in list_of_all_items:
        purple_count += 1

for item in list_of_brown_items:
    if item not in list_of_all_items:
        brown_list += 1

for item in list_of_blue_items:
    if item not in list_of_all_items:
        blue_count += 1

EDIT:

Thank you for your help. I checked a quick test to find out how best to use a large test case:

    my original: 30.21s
           sets: 00.02s
         filter: 30.01s
  sum generator: 31.08s

It's amazing how much more efficient to use sets.

Thanks again.

+4
source share
2 answers

Use sets, so you do not need to continue the loop:

set_of_all_items = set(list_of_all_items)
purple_count = len(set(list_of_purple_items).difference(list_of_all_items))
brown_count = len(set(list_of_brown_items).difference(list_of_all_items))
blue_count = len(set(list_of_blue_items).difference(list_of_all_items))

, ; . C ( set ).

, set.difference() , :

>>> import timeit
>>> import random
>>> all = range(10000)
>>> random.shuffle(all)
>>> all[:-1000] = []
>>> some = [random.randrange(10000) for _ in range(1000)]
>>> timeit.timeit('len(set(some).difference(all))', 'from __main__ import some, all', number=10000)
0.9517788887023926
>>> timeit.timeit('len(set(some).difference(all))', 'from __main__ import some, all; all = set(all)', number=10000)
0.90407395362854
+13

sum set, :

main_set=set(list_of_all_items)
sum(1 for i in set(list_of_purple_items) if i not in main_set)
+2

All Articles