Search for intersection / difference between python lists

I have two python lists:

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] b = ['the', 'when', 'send', 'we', 'us'] 

I need to filter out all elements from a similar to the ones specified in b. As in this case, I should get:

 c = [('why', 4), ('throw', 9), ('you', 1)] 

What should be the most effective way?

+7
source share
6 answers

Understanding the list will work.

 a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] b = ['the', 'when', 'send', 'we', 'us'] filtered = [i for i in a if not i[0] in b] >>>print(filtered) [('why', 4), ('throw', 9), ('you', 1)] 
+10
source

Understanding the list should work:

 c = [item for item in a if item[0] not in b] 

Or with a dictionary understanding:

 d = dict(a) c = {key: value for key in d.iteritems() if key not in b} 
+3
source

in nice, but you should use sets at least for b . If you have numpy, you can also try np.in1d , but if it is faster or not, you should probably try.

 # ruthless copy, but use the set... b = set(b) filtered = [i for i in a if not i[0] in b] # with numpy (note if you create the array like this, you must already put # the maximum string length, here 10), otherwise, just use an object array. # its slower (likely not worth it), but safe. a = np.array(a, dtype=[('key', 's10'), ('val', int)]) b = np.asarray(b) mask = ~np.in1d(a['key'], b) filtered = a[mask] 

Kits also have difference methods, etc., which are probably not useful here, but are usually likely.

+2
source

Since this is marked with numpy , here is a numpy solution using numpy.in1d , compared with list comprehension:

 In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] In [2]: b = ['the', 'when', 'send', 'we', 'us'] In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)]) In [4]: b_ar = np.array(b) In [5]: %timeit filtered = [i for i in a if not i[0] in b] 1000000 loops, best of 3: 778 ns per loop In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)] 10000 loops, best of 3: 31.4 us per loop 

So, for 5 entries, list comprehension is faster.

However, for large datasets, numpy's solution is twice as fast as list comprehension:

 In [7]: a = a * 1000 In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)]) In [9]: %timeit filtered = [i for i in a if not i[0] in b] 1000 loops, best of 3: 647 us per loop In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)] 1000 loops, best of 3: 302 us per loop 
+2
source

Try the following:

 a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] b = ['the', 'when', 'send', 'we', 'us'] c=[] for x in a: if x[0] not in b: c.append(x) print c 

Demo: http://ideone.com/zW7mzY

0
source

Use filter:

 c = filter(lambda (x, y): False if x in b else True, a) 
-one
source

All Articles