Conditional removal of duplicates pandas python

Question

Conditional removal of duplicates pandas python

Is there a way to conditionally delete duplicates (using drop_duplicates specifically) in a pandas data framework with approximately 10 columns and 400,000 rows? That is, I want to save all rows that have 2 columns, satisfy the condition: if the combination of date (column) and storage (column) # is unique, save the row, another wise, click.

+8

python python-2.7 numpy pandas dataframe

Morgan sacco May 03 '15 at 4:00

source share

1 answer

Zero · Answer 1 · 2015-05-03T04:08:02+0000

Use drop_duplicates to return data by deleting duplicate rows, optionally only taking into account specific columns

Let the initial data frame look like

 In [34]: df Out[34]: Col1 Col2 Col3 0 AB 10 1 AB 20 2 AC 20 3 CB 20 4 AB 20

If you want to use unique combinations from specific columns 'Col1', 'Col2'

 In [35]: df.drop_duplicates(['Col1', 'Col2']) Out[35]: Col1 Col2 Col3 0 AB 10 2 AC 20 3 CB 20

If you want to use unique combinations of all columns

 In [36]: df.drop_duplicates() Out[36]: Col1 Col2 Col3 0 AB 10 1 AB 20 2 AC 20 3 CB 20

Conditional removal of duplicates pandas python

More articles: