Python: removing duplicate CSV records

I have a CSV file with multiple entries. Csv example:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
ed, 123, ed@x.com

I am trying to remove duplicates on a specific column in CSV, but with the code below, I get a “list index out of range”. I thought, comparing row[1]with newrows[1], I would find all the duplicates and only rewrite the unique entries in file2.csv. This does not work, and I do not understand why.

f1 = csv.reader(open('file1.csv', 'rb'))
    newrows = []
    for row in f1:
        if row[1] not in newrows[1]:
            newrows.append(row)
    writer = csv.writer(open("file2.csv", "wb"))
    writer.writerows(newrows)

My end result is to have a list that supports a sequence of files ( setwon't work ... right?), Which should look like this:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
+5
source share
1 answer

row[1]refers to the second column in the current row (telephone). It's good.

newrows.append(row) .

row[1] in newrows, . , . . , , .

- :

f1 = csv.reader(open('file1.csv', 'rb'))
writer = csv.writer(open("file2.csv", "wb"))
phone_numbers = set()
for row in f1:
    if row[1] not in phone_numbers:
        writer.writerow(row)
        phone_numbers.add( row[1] )
+8

All Articles