Conditional removal of duplicates pandas python

Is there a way to conditionally delete duplicates (using drop_duplicates specifically) in a pandas data framework with approximately 10 columns and 400,000 rows? That is, I want to save all rows that have 2 columns, satisfy the condition: if the combination of date (column) and storage (column) # is unique, save the row, another wise, click.

+8
python numpy pandas dataframe
source share
1 answer

Use drop_duplicates to return data by deleting duplicate rows, optionally only taking into account specific columns

Let the initial data frame look like

 In [34]: df Out[34]: Col1 Col2 Col3 0 AB 10 1 AB 20 2 AC 20 3 CB 20 4 AB 20 

If you want to use unique combinations from specific columns 'Col1', 'Col2'

 In [35]: df.drop_duplicates(['Col1', 'Col2']) Out[35]: Col1 Col2 Col3 0 AB 10 2 AC 20 3 CB 20 

If you want to use unique combinations of all columns

 In [36]: df.drop_duplicates() Out[36]: Col1 Col2 Col3 0 AB 10 1 AB 20 2 AC 20 3 CB 20 
+6
source share

All Articles