I have a CSV file with multiple entries. Csv example:
user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
ed, 123, ed@x.com
I am trying to remove duplicates on a specific column in CSV, but with the code below, I get a “list index out of range”. I thought, comparing row[1]with newrows[1], I would find all the duplicates and only rewrite the unique entries in file2.csv. This does not work, and I do not understand why.
f1 = csv.reader(open('file1.csv', 'rb'))
newrows = []
for row in f1:
if row[1] not in newrows[1]:
newrows.append(row)
writer = csv.writer(open("file2.csv", "wb"))
writer.writerows(newrows)
My end result is to have a list that supports a sequence of files ( setwon't work ... right?), Which should look like this:
user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
source
share