If you can reserve 16 GB of memory for this program, I wrote a program called sample that shuffles the lines of a file by reading their byte offsets, shuffling the offsets, and then printing the output by searching through the shuffled offsets file. It uses 8 bytes for each 64-bit offset, thus 16 GB for input in two billion lines.
It won't be fast, but on a system with enough memory, sample will shuffle files that are large enough to cause GNU shuf . In addition, it uses the mmap routines to try to minimize the input / output costs of the second pass through your file. It also has several other options; see --help for more details.
By default, this program will be selected without replacement and shuffling on one line. If you want to shuffle with a replacement, or if your input is in FASTA, FASTQ or another multi-line format, you can add some parameters to customize how the selection is performed. (Or you can take an alternative approach, which I attach to in the Perl gist below, but sample addresses these cases.)
If your FASTA sequences are on each of the two lines, that is, they alternate between the sequence header on one line and the sequence data on the next, you can still move with sample and half the memory, since you are only shuffling half the number of offsets. See --lines-per-offset ; you must specify 2 , for example, to shuffle pairs of lines.
In the case of FASTQ files, their records are split every four lines. You can specify --lines-per-offset=4 to shuffle the FASTQ file from the fourth part of the memory needed to shuffle a single-line file.
In addition, I have a gist here , written in Perl, that will display the sequences without replacement from the FASTA file without taking into account the number of lines in the sequence. Note that this is not exactly the same as shuffling the whole file, but you can use this as a starting point as it collects offsets. Instead of fetching some offsets, you should delete line 47, which sorts the shuffled indexes, and then use the file search operations to read through the file using the list of shuffled indexes directly.
Again, this will not be so fast because you go through a very large file out of order, but storing offsets is much cheaper than storing whole lines, and adding mmap routines may help a bit with what is essentially a series of random access operations. And if you work with FASTA, you will have even less offsets for storage, so the use of your memory (with the exception of any relatively insignificant overhead costs for the container and the program) should be no more than 8 GB - and, most likely, less in depending on its structure.