Python: removing duplicate CSV records

Question

Python: removing duplicate CSV records

I have a CSV file with multiple entries. Csv example:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
ed, 123, ed@x.com

I am trying to remove duplicates on a specific column in CSV, but with the code below, I get a “list index out of range”. I thought, comparing row[1]with newrows[1], I would find all the duplicates and only rewrite the unique entries in file2.csv. This does not work, and I do not understand why.

f1 = csv.reader(open('file1.csv', 'rb'))
    newrows = []
    for row in f1:
        if row[1] not in newrows[1]:
            newrows.append(row)
    writer = csv.writer(open("file2.csv", "wb"))
    writer.writerows(newrows)

My end result is to have a list that supports a sequence of files ( setwon't work ... right?), Which should look like this:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com

+5

python csv

serk Oct 7 '11 at 3:35

source share

1 answer

Winston Ewert · Accepted Answer · 2011-10-07T03:41:33+0000

row[1]refers to the second column in the current row (telephone). It's good.

newrows.append(row) .

row[1] in newrows, . , . . , , .

- :

f1 = csv.reader(open('file1.csv', 'rb'))
writer = csv.writer(open("file2.csv", "wb"))
phone_numbers = set()
for row in f1:
    if row[1] not in phone_numbers:
        writer.writerow(row)
        phone_numbers.add( row[1] )

Python: removing duplicate CSV records

More articles: