A quick way to merge huge files (> = 7 GB) into one

I have three huge files, only 2 columns, and I need both. I want to merge them into a single file, which I can then write to the SQLite database.

I used Python and did the job, but it took> 30 minutes, and also hung my system on 10 of them. I was wondering if there is a faster way using awk or any other unix tool. A faster way inside Python would be great. The code below:

'''We have tweets of three months in 3 different files.
Combine them to a single file '''
import sys, os
data1 = open(sys.argv[1], 'r')
data2 = open(sys.argv[2], 'r')
data3 = open(sys.argv[3], 'r')
data4 = open(sys.argv[4], 'w')
for line in data1:
    data4.write(line)
data1.close()
for line in data2:
    data4.write(line)
data2.close()
for line in data3:
    data4.write(line)
data3.close()
data4.close()
+5
source share
3 answers

The standard Unix way to combine files is cat. It may not be much faster, but it will be faster.

cat file1 file2 file3 > bigfile

, , cat sqlite

cat file1 file2 file3 | sqlite database

python , , , , . file.read(65536) 64k , , for

+12

UNIX- :

cat file1 file2 file3 > file4
+2

I assume that you need to repeat this process and that speed is a critical factor.

Try opening the files as binary files and experiment with the size of the block you are reading. Try 4096 and 8192 bytes as they are the common base buffer sizes.

There is a similar question, Is it possible to speed up python I / O? which may also be of interest.

+1
source

All Articles