You can use the following strategy of division and subjugation:
Create an H () function that can assign a number to each entry in the input file. For r2, which will be sorted after r1, it must return a larger number for r2 than for r1. Use this function to split all records into separate files that will fit into memory so you can sort them. Once you do this, you can simply concatenate the sorted files to get one large sorted file.
Suppose you have this input file, where each line represents an entry
Alan Smith Jon Doe Bill Murray Johnny Cash
Let's just build H () so that it uses the first letter in the entry so that you can get up to 26 files, but in this example you just get 3:
<file1> Alan Smith <file2> Bill Murray <file10> Jon Doe Johnny Cash
Now you can sort each individual file. To replace "John Doe" and "Johnny Cash" in <file10>. Now, if you just merge 3 files, you will have a sorted version of the input.
Please note that you divide first and only win (sort) later. However, you are required to perform the separation in such a way that the resulting parts that you need to sort do not overlap, making it easier to combine the result.
The method by which you implement the partition function H () is largely dependent on the nature of your input. After you figure out this part, the rest should be easy.
VoidPointer Jan 18 2018-10-18 17:17
source share