Firstly, step 3 itself is not a merge sort, it's all a merge sort. Step 3 is just a merge, without sorting at all.
And as for the required storage, there are two possibilities.
The first is to combine the sorted data into groups of two. Let's say you have three groups:
A: 1 3 5 7 9 B: 0 2 4 6 8 C: 2 3 5 7
Using this method, you combine A and B into one group Y , then merge Y and C into the final result Z :
Y: 0 1 2 3 4 5 6 7 8 9 (from merging A and B). Z: 0 1 2 2 3 3 4 5 5 6 7 7 8 9 (from merging Y and C).
This has the advantage of a very small requirement for read-only memory in that you only need to store the βnextβ item from each of the two lists, but, of course, you need to perform several merge operations.
The second way is the N-way βrightβ merge when you select the next item from any of the groups. In doing so, you will check the lowest value in each list to see which one will be next:
Z: 0 1 2 2 3 3 4 5 5 6 7 7 8 9 (from merging A, B and C).
This includes only one merge operation, but it requires more memory, basically one item in the list.
Which one you choose depends on the available memory and the size of the element.
For example, if you have 100M of memory available to you and the element size is 100K, you can use the latter. This is because for a 2G file you will need 20 groups (100 M each) for the sort phase, which means that the correct N-shaped merge will require 100 K by 20 or about 2 M, which is significantly lower than your memory availability.
Alternatively, suppose you only have 1M. It will be about 2000 (2G / 1M) groups and multiplying by 100K gives 200M, which greatly exceeds your capabilities.
So you will need to do this merge in a few passes. Keep in mind that this should not be multiple passes combining the two lists.
You can find an average site where, for example, each passage combines ten lists. Ten groups of 100K is only mega, so they will fit into your memory limit, and this will result in fewer merge passes.