Removing some duplicates from a list in Python

I would like to remove a certain number of duplicates from the list without deleting all of them. For example, I have a list of [1,2,3,4,4,4,4,4] , and I want to delete 3 of 4, so I stayed with [1,2,3,4,4] . A naive way to do this is likely to be

 def remove_n_duplicates(remove_from, what, how_many): for j in range(how_many): remove_from.remove(what) 

Is there any way to do to remove three 4 in one pass through the list, but save the other two.

+5
source share
5 answers

If you just want to remove the first n occurrences of something from the list, this is pretty easy to do with the generator:

 def remove_n_dupes(remove_from, what, how_many): count = 0 for item in remove_from: if item == what and count < how_many: count += 1 else: yield item 

Usage is as follows:

 lst = [1,2,3,4,4,4,4,4] print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4] 

Saving a certain number of duplicates of any element is similarly easy if we use a small additional auxiliary storage:

 from collections import Counter def keep_n_dupes(remove_from, how_many): counts = Counter() for item in remove_from: counts[item] += 1 if counts[item] <= how_many: yield item 

Similar usage:

 lst = [1,1,1,1,2,3,4,4,4,4,4] print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4] 

Here you enter a list and the maximum number of items that you want to save. The caveat is that elements must be hashed ...

+6
source

You can use the Python recruitment functions with and to create a list of lists, and then flatten the list. The list of results will be [1, 2, 3, 4, 4].

 x = [1,2,3,4,4,4,4,4] x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist] 

As a function, you will have the following.

 def remove_n_duplicates(remove_from, what, how_many): return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist] 
0
source

If the list is sorted, there is a quick fix:

 def remove_n_duplicates(remove_from, what, how_many): index = 0 for i in range(len(remove_from)): if remove_from[i] == what: index = i break if index + how_many >= len(remove_from): #There aren't enough things to remove. return for i in range(index, how_many): if remove_from[i] != what: #Again, there aren't enough things to remove return endIndex = index + how_many return remove_from[:index+1] + remove_from[endIndex:] 

Note that this returns a new array, so you want to do arr = removeCount (arr, 4, 3)

0
source

I can solve it differently using collections.

 from collections import Counter li = [1,2,3,4,4,4,4] cntLi = Counter(li) print cntLi.keys() 
-1
source

Here is another trick that can sometimes be useful. Should not be taken as a recommended recipe.

 def remove_n_duplicates(remove_from, what, how_many): exec('remove_from.remove(what);'*how_many) 
-1
source

All Articles