How to filter a list based on another list containing wildcards?

How to filter a list based on another list that contains partial values ​​and wildcards? The following example is what I still have:

l1 = ['test1', 'test2', 'test3', 'test4', 'test5'] l2 = set(['*t1*', '*t4*']) filtered = [x for x in l1 if x not in l2] print filtered 

In this example:

 ['test1', 'test2', 'test3', 'test4', 'test5'] 

However, I want to limit the results based on l2 as follows:

 ['test2', 'test3', 'test5'] 
+7
python list glob
source share
3 answers

Use the fnmatch module and list comprehension with any() :

 >>> from fnmatch import fnmatch >>> l1 = ['test1', 'test2', 'test3', 'test4', 'test5'] >>> l2 = set(['*t1*', '*t4*']) >>> [x for x in l1 if not any(fnmatch(x, p) for p in l2)] ['test2', 'test3', 'test5'] 
+10
source share

you can also use filter () instead of understanding the list, which may have the advantage that you can easily change your filter function for more flexibility:

 >>> l1 = ['test1', 'test2', 'test3', 'test4', 'test5'] >>> l2 = set(['*t1*', '*t4*']) >>> filterfunc = lambda item: not any(fnmatch(item, pattern) for pattern in l2) >>> filter(filterfunc, l1) Out: ['test2', 'test3', 'test5'] >>> # now we don't like our filter function no more, we assume that our l2 set should match on any partial match so we can get rid of the star signs: >>> l2 = set(['t1', 't4']) >>> filterfunc = lambda item: not any(pattern in item for pattern in l2) >>> filter(filterfunc, l1) Out: ['test2', 'test3', 'test5'] 

Thus, you can even generalize your filterfunc to work with several sets of templates:

 >>> from functools import partial >>> def filterfunc(item, patterns): return not any(pattern in item for pattern in patterns) >>> filter(partial(filterfunc, patterns=l2), l1) Out: ['test2', 'test3', 'test5'] >>> filter(partial(filterfunc, patterns={'t1','test5'}), l1) Out: ['test2', 'test3', 'test4'] 

And of course, you can easily update filterfunc to accept regular expressions in a set of patterns, for example.

+1
source share

I think the easiest approach for your use case is to simply check the substring using Python in (although this means removing asterisks):

 def remove_if_not_substring(l1, l2): return [i for i in l1 if not any(j in i for j in l2)] 

so our data:

 l1 = ['test1', 'test2', 'test3', 'test4', 'test5'] l2 = set(['t1', 't4']) 

And calling our function with it:

 remove_if_not_substring(l1, l2) 

returns:

 ['test2', 'test3', 'test5'] 
+1
source share

All Articles