I want to open a file, read it, remove duplicates in two columns of the file, and then use the file without duplicates to perform some calculations. To do this, I use pandas.drop_duplicates, which after removing duplicates also reduces the indexing values. For example, after reset line 1, file1 becomes file2:
file1: Var1 Var2 Var3 Var4 0 52 2 3 89 1 65 2 3 43 2 15 1 3 78 3 33 2 4 67 file2: Var1 Var2 Var3 Var4 0 52 2 3 89 2 15 1 3 78 3 33 2 4 67
For further use of file2 as a data frame, I need to reindex it to 0, 1, 2, ...
Here is the code I'm using:
file1 = pd.read_csv("filename.txt",sep='|', header=None, names=['Var1', 'Var2', 'Var3', 'Var4']) file2 = file1.drop_duplicates(["Var2", "Var3"])
Although the code works and gives good results, reindexing gives the following warning:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http:
I checked the link, but cannot figure out how to change the code. Any ideas on how to fix this?