How to get n list items not contained in another?

I have two lists of different sizes (one may be larger than the other), with some common elements. I would like to get items nfrom the first list that are not in the second.

I see two families of solutions (example below for n=3)

a = [i for i in range(2, 10)]
b = [i * 2 for i in range (1, 10)]
# [2, 3, 4, 5, 6, 7, 8, 9] [2, 4, 6, 8, 10, 12, 14, 16, 18]

# solution 1: generate the whole list, then slice
s1 = list(set(a) - set(b))
s2 = [i for i in a if i not in b]

for i in [s1, s2]:
    print (i[:3])

# solution 2: the simple loop solution
c = 0
s3 = []
for i in a:
    if i not in b:
        s3.append(i)
        c += 1
        if c == 3:
            break
print(s3)

They are all true, way out

[9, 3, 5]
[3, 5, 7]
[3, 5, 7]

(the first solution does not give the first 3, because it setdoes not preserve order - but this is normal in my case, since in any case I will have unsorted (even obviously shuffled) lists)

Are the most pythonic and reasonably optimal?

Solution 1 first calculates the difference, then slices, which I find rather ineffective (the size of my lists will be ~ 100 thousand elements, I will look for the first 100).

2 , ( , , , - Python, , ).

2, .

+4
3

set.difference slice:

print(list(set(a).difference(b))[:3])
[3, 5, 7]

set.difference a, b:

set([3, 5, 7, 9])

, .

iter, next :

diff = iter(set(a).difference(b))
n = 3
sli = [next(diff) for _ in range(n)]
print(sli)

. , :

In [1]: a = [i for i in range(2, 10000000)]  
In [2]: b = [i * 2 for i in range (1, 10000000)]   
In [3]: timeit set(a).difference(b)
1 loops, best of 3: 848 ms per loop    
In [4]: timeit set(a)- set(b)
1 loops, best of 3: 1.54 s per loop

s2 = [i for i in a if i not in b] , , .

iter .difference:

In [11]: %%timeit                                
diff = iter(set(a).difference(b))
n = 3
sli = [next(diff) for _ in range(n)]
   ....: 
1 loops, best of 3: 797 ms per loop
+5

, , 100, .

import random
from itertools import islice

def m1(a,b):
    return list(set(a) - set(b))[:100]
def m2(a,b):
    return list(set(a).difference(b))[:100] 
def m3(a,b):
    return list(islice(set(a).difference(b), 100))
def m4(a,b):
    bset = set(b)
    return list(islice((x for x in a if x not in bset), 100))

>>> a = [random.randint(0, 10**6) for i in range(10**5)]
>>> b = [random.randint(0, 10**6) for i in range(10**5)]
>>> %timeit m1(a,b)
10 loops, best of 3: 121 ms per loop
>>> %timeit m2(a,b)
10 loops, best of 3: 98.7 ms per loop
>>> %timeit m3(a,b)
10 loops, best of 3: 82.3 ms per loop
>>> %timeit m4(a,b)
10 loops, best of 3: 42.8 ms per loop
>>> 
>>> a = list(range(10**5))
>>> b = [i*2 for i in a]
>>> %timeit m1(a,b)
10 loops, best of 3: 58.7 ms per loop
>>> %timeit m2(a,b)
10 loops, best of 3: 50.8 ms per loop
>>> %timeit m3(a,b)
10 loops, best of 3: 40.7 ms per loop
>>> %timeit m4(a,b)
10 loops, best of 3: 21.7 ms per loop

bset. , , 100 , 10 ^ 4 , , , . , , , , .

+2

May turn b into a set, but not a. Set up the generator to use laziness, then use understanding to get the elements you need:

a = [i for i in range(2, 10)]
b = [i * 2 for i in range (1, 10)]
bset = set(b)
agen = (i for i in a if not i in set(b))
first3 = [j for (i,j) in enumerate(agen) if i < 3]
print(first3)
0
source

All Articles