you want to create an in-memory index for the file:
- create empty list
open file- read it line by line (using
f.readline() and save in the list a tuple consisting of the value you want to sort (extracted using line.split('\t').strip() ) and the line offset in the file ( which you can get a call to f.tell() before calling f.readline() ) close filesort list
Then, to print the sorted file, open the file again, and for each item in your list, use f.seek(offset) to move the file pointer to the beginning of the line, f.readline() to read the line and print line,
Optimization: you can save the length of the string in the list so you can use f.read(length) at the printing stage.
Sample code (optimized for reading, not speed):
def build_index(filename, sort_col): index = [] f = open(filename) while True: offset = f.tell() line = f.readline() if not line: break length = len(line) col = line.split('\t')[sort_col].strip() index.append((col, offset, length)) f.close() index.sort() return index def print_sorted(filename, col_sort): index = build_index(filename, col_sort) f = open(filename) for col, offset, length in index: f.seek(offset) print f.read(length).rstrip('\n') if __name__ == '__main__': filename = 'somefile.txt' sort_col = 2 print_sorted(filename, sort_col)
gurney alex
source share