List filtering in python

Question

List filtering in python

I want to filter duplicate items in my list e.g.

foo = ['a','b','c','a','b','d','a','d']

I'm only interested in:

 ['a','b','c','d']

What would be an effective way to achieve this? Greetings

+4

python list unique

Hellnar Oct 20 '09 at 18:12

source share

10 answers

list( set (foo)) if you are using Python 2.5 or higher, but this does not support order.

+21

sc45 Oct 20 '09 at 18:14

source share

Since there is no order-saving answer to save the list, I suggest the following:

 >>> temp = set() >>> [c for c in foo if c not in temp and (temp.add(c) or True)] ['a', 'b', 'c', 'd']

which can also be written as

 >>> temp = set() >>> filter(lambda c: c not in temp and (temp.add(c) or True), foo) ['a', 'b', 'c', 'd']

Depending on how many elements are in foo , you may have faster results with repeated hash requests instead of repeated iterative requests in a temporary list.

c not in temp checks that temp does not have a c element; and or True part c elements will be thrown into the output list when the element is added to the set.

+5

Mark rushakoff Oct 21 '09 at 12:47

source share

 >>> bar = [] >>> for i in foo: if i not in bar: bar.append(i) >>> bar ['a', 'b', 'c', 'd']

this would be the easiest way to remove duplicates from the list and preserve order as much as possible (although “order” here is essentially a misconception).

+3

Silentghost Oct 20 '09 at 18:29

source share

If you care about order, the readable way is as follows:

 def filter_unique(a_list): characters = set() result = [] for c in a_list: if not c in characters: characters.add(c) result.append(c) return result

Depending on your needs for speed, performance, space consumption, you may find unsuitable above. In this case, indicate your requirements, and we can try to do better :-)

+2

Francesco Oct 20 '09 at 18:21

source share

If you write a function for this, I would use a generator, it just wants to be used in this case.

  def unique (iterable):
     yielded = set ()
     for item in iterable:
         if item not in yielded:
             yield item
             yielded.add (item)

+2

Dasich Oct 21 '09 at 0:33

source share

Inspired by Francesco, answer , instead of creating our own filter() -type function, let us do the built-in work for us:

 def unique(a, s=set()): if a not in s: s.add(a) return True return False

Using:

 uniq = filter(unique, orig)

This may or may not be faster or slower than the answer, which implements all the work in pure Python. Benchmark and look. Of course, this only works once, but it demonstrates the concept. Of course, the ideal solution is to use a class:

 class Unique(set): def __call__(self, a): if a not in self: self.add(a) return True return False

Now we can use it as much as we want:

 uniq = filter(Unique(), orig)

Once again, we can (or cannot) throw performance out of the window - the benefits of using the built-in function can be offset by the overhead of the class. I just though it was an interesting idea.

+1

Chris lutz Oct 21 '09 at 0:55

source share

This is what you want if you need a sorted list at the end:

 >>> foo = ['a','b','c','a','b','d','a','d'] >>> bar = sorted(set(foo)) >>> bar ['a', 'b', 'c', 'd']

+1

hughdbrown Oct 21 '09 at 4:06

source share

 import numpy as np np.unique(foo)

0

locojay Nov 29 '12 at 20:32

source share

You can do something like an ugly list comprehension.

 [l[i] for i in range(len(l)) if l.index(l[i]) == i]

0

user1969453 Apr 25 '14 at 1:23

source share

Justin R. · Accepted Answer · 2009-10-20T18:14:26+0000

Move foo to set if you don't need the order of the elements.

List filtering in python

More articles: