Python pandas remove duplicates sequentially

Is there a function that ensures that the index is unique or that it can only handle this in python itself, converting it to a dict and vice versa, or something like that?

As noted in the comments below: python pandas is a project built on numpy / scipy.

to_dict and back works, but I'm sure it gets slow when you get BIG.

In [24]: a = pandas.Series([1,2,3], index=[1,1,2]) In [25]: a Out[25]: 1 1 1 2 2 3 In [26]: a = a.to_dict() In [27]: a Out[27]: {1: 2, 2: 3} In [28]: a = pandas.Series(a) In [29]: a Out[29]: 1 2 2 3 
+6
source share
2 answers

Use groupby and last()

 In [279]: s Out[279]: a 1 b 2 b 3 b 4 e 5 In [280]: grouped = s.groupby(level=0) In [281]: grouped.first() Out[281]: a 1 b 2 e 5 In [282]: grouped.last() Out[282]: a 1 b 4 e 5 
+3
source

BTW, we plan to add the drop_duplicates method to Series, such as DataFrame.drop_duplicates , in the near future.

+6
source

Source: https://habr.com/ru/post/928064/


All Articles