Removing duplicates from a list of lists in Python

Can someone suggest a good solution for removing duplicates from nested lists if you want to evaluate duplicates based on the first element of each nested list?

The main list is as follows:

L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]] 

If there is another list with the same element in the first position [k][0] that has already occurred, then I would like to delete this list and get this result:

 L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33]] 

Can you suggest an algorithm to achieve this?

+9
python list
Jul 17 '09 at 13:45
source share
5 answers

Do you care about maintaining order / duplicate which is deleted? If not, then:

 dict((x[0], x) for x in L).values() 

will do it. If you want to keep order and want to keep the first one you find, then:

 def unique_items(L): found = set() for item in L: if item[0] not in found: yield item found.add(item[0]) print list(unique_items(L)) 
+27
Jul 17 '09 at 13:54
source share

use dict like this:

 L = {'14': ['65', 76], '2': ['5', 6], '7': ['12', 33]} L['14'] = ['22', 46] 

if you get the first list from some external source, convert it like this:

 L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]] L_dict = dict((x[0], x[1:]) for x in L) 
+3
Jul 17 '09 at 13:52
source share

I'm not sure what you mean by "another list", so I assume that you say these lists inside L

 a=[] L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']] for item in L: if not item[0] in a: a.append(item[0]) print item 
0
Jul 17 '09 at 13:50
source share

If the order doesn't matter, the code below

 print [ [k] + v for (k, v) in dict( [ [a[0], a[1:]] for a in reversed(L) ] ).items() ] 

gives

[['2', '5', '6'], ['14', '65', '76'], ['7', '12', '33']]

0
Jul 17 '09 at 14:03
source share

Use Pandas:

 import pandas as pd L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']] df = pd.DataFrame(L) df = df.drop_duplicates() L_no_duplicates = df.values.tolist() 

If you want to remove duplicates in specific columns, use instead:

 df = df.drop_duplicates([1,2]) 
0
Mar 17 '16 at 8:57
source share



All Articles