Python pandas remove duplicates sequentially

Question

Python pandas remove duplicates sequentially

Is there a function that ensures that the index is unique or that it can only handle this in python itself, converting it to a dict and vice versa, or something like that?

As noted in the comments below: python pandas is a project built on numpy / scipy.

to_dict and back works, but I'm sure it gets slow when you get BIG.

In [24]: a = pandas.Series([1,2,3], index=[1,1,2]) In [25]: a Out[25]: 1 1 1 2 2 3 In [26]: a = a.to_dict() In [27]: a Out[27]: {1: 2, 2: 3} In [28]: a = pandas.Series(a) In [29]: a Out[29]: 1 2 2 3

+6

python pandas

mathtick Oct 18 '12 at 19:56

source share

2 answers

BTW, we plan to add the drop_duplicates method to Series, such as DataFrame.drop_duplicates , in the near future.

+6

Wes mckinney Oct 20 '12 at 15:20

source share

root · Accepted Answer · 2012-10-18T20:07:06+0000

Use groupby and last()

 In [279]: s Out[279]: a 1 b 2 b 3 b 4 e 5 In [280]: grouped = s.groupby(level=0) In [281]: grouped.first() Out[281]: a 1 b 2 e 5 In [282]: grouped.last() Out[282]: a 1 b 4 e 5

Python pandas remove duplicates sequentially

More articles: