Using a buffered reader for large .csv, Python files

I am trying to open large .csv files (16k lines +, ~ 15 columns) in a python script, and I have some problems.

I use the built-in open () function to open a file and then declare csv.DictReader using the input file. The cycle is structured as follows:

for (i, row) in enumerate(reader): # do stuff (send serial packet, read response) 

However, if I use a file longer than 20 lines, the file will open, but in a few iterations I will get a ValueError: I / O operation in a closed file.

My thought is that I may have a shortage of memory (although the 16k file has only 8 MB and I have 3 GB of RAM), in which case I expect that I will need to use some kind of buffer to load only file sections to memory at a time.

Am I on the right track? Or could there be other reasons for the file closing unexpectedly?

edit: in about half the cases, I run this with csv of 11 lines, this gives me a ValueError. An error does not always occur on the same line.

+4
source share
2 answers

16k lines are nothing for a 3GB Ram, most likely your problem is something else, for example. you take too much time in some other process that interferes with the open file. To be sure, and in any case, for speed, when you have 3 GB, load the entire file into memory, and then parse, for example.

 import csv import cStringIO data = open("/tmp/1.csv").read() reader = csv.DictReader(cStringIO.StringIO(data)) for row in reader: print row 

In this, at least you should not get a file open error.

+4
source

csv_reader is faster. Read the entire file as blocks. To avoid memory leaks, it is best to use a subprocess. from a multiprocessing process

 def child_process(name): # Do the Read and Process stuff here.if __name__ == '__main__': # Get file object resource. ..... p = Process(target=child_process, args=(resource,)) p.start() p.join() 

For more information, follow this link. http://articlesdictionary.wordpress.com/2013/09/29/read-csv-file-in-python/

-1
source

All Articles