Comparing two csv files and getting the difference

Question

Comparing two csv files and getting the difference

I have two csv files that I need to compare and then spit out differnces:

CSV FORMAT:

Name Produce Number Adam Apple 5 Tom Orange 4 Adam Orange 11

I need to compare two csv files and then tell me if there is a difference between Adams apples on sheet and sheet 2 and do this for all names and produce numbers. Both CSV files will be generated the same way.

Any pointers will be appreciated.

+4

python csv

Trying_hard Jun 19 '12 at 20:14

source share

6 answers

Jon clements · Answer 1 · 2012-06-19T20:40:12+0000

If your CSV files are not so large, they will bring your machine to its knees, if you load them into memory, then you can try something like:

 import csv csv1 = list(csv.DictReader(open('file1.csv'))) csv2 = list(csv.DictReader(open('file2.csv'))) set1 = set(csv1) set2 = set(csv2) print set1 - set2 # in 1, not in 2 print set2 - set1 # in 2, not in 1 print set1 & set2 # in both

For large files, you can upload them to the SQLite3 database and use SQL queries to do the same, or sort by the appropriate keys, and then merge.

Aakash gupta · Answer 2 · 2016-07-23T12:48:00+0000

I used csvdiff

 $pip install csvdiff $csvdiff --style=compact col1 a.csv b.csv

Link to package on pypi

I found this link useful

Somekittens · Answer 3 · 2012-06-19T20:21:14+0000

One of the best utilities for comparing two different files is diff .

See here Python implementation: Comparing two .txt files using difflib in Python

Hugh bothwell · Answer 4 · 2012-06-19T20:37:59+0000

 import csv def load_csv_to_dict(fname, get_key, get_data): with open(fname, 'rb') as inf: incsv = csv.reader(inf) incsv.next() # skip header return {get_key(row):get_data(row) for row in incsv} def main(): key = lambda r: tuple(r[0:2]) data = lambda r: int(r[2]) f1 = load_csv_to_dict('file1.csv', key, data) f2 = load_csv_to_dict('file2.csv', key, data) f1keys = set(f1.iterkeys()) f2keys = set(f2.iterkeys()) print("Keys in file1 but not file2:") print(", ".join(str(a)+":"+str(b) for a,b in (f1keys-f2keys))) print("Keys in file2 but not file1:") print(", ".join(str(a)+":"+str(b) for a,b in (f2keys-f1keys))) print("Differing values:") for k in (f1keys & f2keys): a,b = f1[k], f2[k] if a != b: print("{}:{} {} <> {}".format(k[0],k[1], a, b)) if __name__=="__main__": main()

octopusgrabbus · Answer 5 · 2012-06-20T13:00:33+0000

If you want to use the Python csv module with a function generator, you can use a nested loop and compare large .csv files. The following example compares each row using a running comparison:

 import csv def csv_lazy_get(csvfile): with open(csvfile) as f: r = csv.reader(f) for row in r: yield row def csv_cmp_lazy(csvfile1, csvfile2): gen_2 = csv_lazy_get(csvfile2) for row_1 in csv_lazy_get(csvfile1): row_2 = gen_2.next() print("row_1: ", row_1) print("row_2: ", row_2) if row_2 == row_1: print("row_1 is equal to row_2.") else: print("row_1 is not equal to row_2.") gen_2.close()

Chrisp · Answer 6 · 2012-06-19T20:26:04+0000

Here's a start that doesn't use difflib . This is actually just a point, because perhaps Adam and the apples appear twice on the leaf; can you guarantee that this is not so? Should I add apples or is this a mistake?

 import csv fsock = open('sheet.csv','rU') rdr = csv.reader(fsock) sheet1 = {} for row in rdr: name, produce, amount = row sheet1[(name, produce)] = int(amount) # always an integer? fsock.close() # repeat the above for the second sheet, then compare

Do you understand the idea?

Comparing two csv files and getting the difference

More articles: