Trying to upload a file in python. This is a very large file (1.5Gb), but I have available memory, and I just want to do it once (hence the use of python, I just need to sort the file once so that python is a simple choice).
My problem is that downloading this file leads to a way to make a lot of memory use. When I loaded about 10% of the lines into memory, Python already uses 700Mb, which is obviously too much. At about 50%, the script freezes using 3.03 Gb of real memory (and slowly rising).
I know that this is not the most efficient method for sorting a file (from memory), but I just want it to work, so I can move on to more important problems: D So, what's wrong with the following python code which leads to massive memory usage :
print 'Loading file into memory'
input_file = open(input_file_name, 'r')
input_file.readline() # Toss out the header
lines = []
totalLines = 31164015.0
currentLine = 0.0
printEvery100000 = 0
for line in input_file:
currentLine += 1.0
lined = line.split('\t')
printEvery100000 += 1
if printEvery100000 == 100000:
print str(currentLine / totalLines)
printEvery100000 = 0;
lines.append( (lined[timestamp_pos].strip(), lined[personID_pos].strip(), lined[x_pos].strip(), lined[y_pos].strip()) )
input_file.close()
print 'Done loading file into memory'
EDIT: , - , , , , , , . "" : 1) readLines(), , . , 1.7Gb. , lines.sort(), , , int. , . overhad : D