How to sort a large text file alphabetically?

So, I have a text file, and I need to sort the lines alphabetically. Input Example:

This is the first sentence A sentence here as well But how do I reorder them? 

Output:

 A sentence here as well But how do I reorder them? This is the first sentence 

Here's the thing: This file is so large that I don't have enough memory to split it into a list / array. I tried using Python's built-in sorted () function, and the process was killed.

To give you an idea:

 wc -l data 21788172 data 
+6
source share
2 answers

It looks like you need to do a merge sort: divide the file into blocks, sort each block, and then merge the sorted blocks back. See Python class for combining sorted files, how can this be improved?

+5
source

Similar to what Hugh recommended (but different in that it is not a pure Python solution), you can sort the file in chunks. For example, divide the file into 26 other files - A.txt, B.txt, C.txt, etc. Sort each of them separately, and then combine them to get the final result.

The main thing to keep in mind that the first pass through the source file is to simply split the lines into their first letters. Only after that you start sorting through each file. A simple cat A.txt B.txt ... will handle the rest.

+1
source

All Articles