List filtering in python

I want to filter duplicate items in my list e.g.

foo = ['a','b','c','a','b','d','a','d'] 

I'm only interested in:

 ['a','b','c','d'] 

What would be an effective way to achieve this? Greetings

+4
source share
10 answers

Move foo to set if you don't need the order of the elements.

+12
source

list( set (foo)) if you are using Python 2.5 or higher, but this does not support order.

+21
source

Since there is no order-saving answer to save the list, I suggest the following:

 >>> temp = set() >>> [c for c in foo if c not in temp and (temp.add(c) or True)] ['a', 'b', 'c', 'd'] 

which can also be written as

 >>> temp = set() >>> filter(lambda c: c not in temp and (temp.add(c) or True), foo) ['a', 'b', 'c', 'd'] 

Depending on how many elements are in foo , you may have faster results with repeated hash requests instead of repeated iterative requests in a temporary list.

c not in temp checks that temp does not have a c element; and or True part c elements will be thrown into the output list when the element is added to the set.

+5
source
 >>> bar = [] >>> for i in foo: if i not in bar: bar.append(i) >>> bar ['a', 'b', 'c', 'd'] 

this would be the easiest way to remove duplicates from the list and preserve order as much as possible (although β€œorder” here is essentially a misconception).

+3
source

If you care about order, the readable way is as follows:

 def filter_unique(a_list): characters = set() result = [] for c in a_list: if not c in characters: characters.add(c) result.append(c) return result 

Depending on your needs for speed, performance, space consumption, you may find unsuitable above. In this case, indicate your requirements, and we can try to do better :-)

+2
source

If you write a function for this, I would use a generator, it just wants to be used in this case.

  def unique (iterable):
     yielded = set ()
     for item in iterable:
         if item not in yielded:
             yield item
             yielded.add (item)
+2
source

Inspired by Francesco, answer , instead of creating our own filter() -type function, let us do the built-in work for us:

 def unique(a, s=set()): if a not in s: s.add(a) return True return False 

Using:

 uniq = filter(unique, orig) 

This may or may not be faster or slower than the answer, which implements all the work in pure Python. Benchmark and look. Of course, this only works once, but it demonstrates the concept. Of course, the ideal solution is to use a class:

 class Unique(set): def __call__(self, a): if a not in self: self.add(a) return True return False 

Now we can use it as much as we want:

 uniq = filter(Unique(), orig) 

Once again, we can (or cannot) throw performance out of the window - the benefits of using the built-in function can be offset by the overhead of the class. I just though it was an interesting idea.

+1
source

This is what you want if you need a sorted list at the end:

 >>> foo = ['a','b','c','a','b','d','a','d'] >>> bar = sorted(set(foo)) >>> bar ['a', 'b', 'c', 'd'] 
+1
source
 import numpy as np np.unique(foo) 
0
source

You can do something like an ugly list comprehension.

 [l[i] for i in range(len(l)) if l.index(l[i]) == i] 
0
source

All Articles