This is strange. I would expect this error if you tried to compress a large binary file that did not contain many new lines, since such a file might contain a "line" that was too large for your RAM, but this should not happen on a line structured by CSV file.
But in any case, it is not very efficient to compress files line by line. Despite the fact that the OS buffers the input / output of data on the disk, as a rule, it is much faster to read and write larger blocks of data, for example, 64 kB.
I have 2 GB of RAM on this computer and I just successfully used the program below to compress the tar tar 2.8GB archive.
#! /usr/bin/env python import gzip import sys blocksize = 1 << 16 #64kB def gzipfile(iname, oname, level): with open(iname, 'rb') as f_in: f_out = gzip.open(oname, 'wb', level) while True: block = f_in.read(blocksize) if block == '': break f_out.write(block) f_out.close() return def main(): if len(sys.argv) < 3: print "gzip compress in_file to out_file" print "Usage:\n%s in_file out_file [compression_level]" % sys.argv[0] exit(1) iname = sys.argv[1] oname = sys.argv[2] level = int(sys.argv[3]) if len(sys.argv) > 3 else 6 gzipfile(iname, oname, level) if __name__ == '__main__': main()
I am running Python 2.6.6 and gzip.open() does not support with .
As Andrew Bay notes in the comments, if block == '': will not work correctly in Python 3, since block contains bytes, not a string, and an empty byte object is not compared as equal to an empty text string, We can check the length of the block or compare with b'' (which will also work in Python 2.6+), but the simple way is if not block:
source share