I am writing a very simple script that will count the number of occurrences in a file. The file size is about 300 MB (15 million lines) and has 3 columns. Since I am reading the file line by line, I do not expect python to use a lot of memory. The maximum will be just above 300 MB for storing the count counter.
However, when I look at the activity monitor, the memory usage exceeds 1.5 GB. What am I doing wrong? If this is normal, can someone please explain? Thanks
import csv
def get_counts(filepath):
with open(filepath,'rb') as csvfile:
reader = csv.DictReader(csvfile, fieldnames=['col1','col2','col3'], delimiter=',')
counts = {}
for row in reader:
key1 = int(row['col1'])
key2 = int(row['col2'])
if (key1, key2) in counts:
counts[key1, key2] += 1
else:
counts[key1, key2] = 1
return counts
source
share