If you can open all 1445 output files at once, it's pretty simple:
paths = ['abc{}.dat'.format(i) for i in range(1445)] files = [open(path, 'w') for path in paths] for inpath in ('input{}.dat'.format(i) for i in range(40000)): with infile as open(inpath, 'r') as infile: for linenum, line in enumerate(infile): files[linenum].write(line) for f in files: f.close()
If you can put everything in memory (it looks like it should be about 0.5-5.0 GB of data, which may be good for a 64-bit machine with 8 GB of RAM ...), you can do it like this:
data = [[] for _ in range(1445)] for inpath in ('input{}.dat'.format(i) for i in range(40000)): with infile as open(inpath, 'r') as infile: for linenum, line in enumerate(infile): data[linenum].append(line) for i, contents in enumerate(data): with open('abc{}.dat'.format(i), 'w') as outfile: outfile.write(''.join(contents)
If none of them fits, you may need some kind of hybrid. For example, if you can make 250 files at once, make 6 batches and skip batchnum * 250 lines in each infile .
If the batch solution is too slow, write infile.tell() at the end of each batch in each file, and when you return to the file, use infile.seek() to return there. Something like that:
seekpoints = [0 for _ in range(40000)] for batch in range(6): start = batch * 250 stop = min(start + 250, 1445) paths = ['abc{}.dat'.format(i) for i in range(start, stop)] files = [open(path, 'w') for path in paths] for infilenum, inpath in enumerate('input{}.dat'.format(i) for i in range(40000)): with infile as open(inpath, 'r') as infile: infile.seek(seekpoints[infilenum]) for linenum, line in enumerate(infile): files[linenum].write(line) seekpoints[infilenum] = infile.tell() for f in files: f.close()