To do this, you need fixed-length lines (in this case implementation details should be obvious) or information on how many lines there are (and, possibly, to improve performance) at what offsets inside the created file (sort index).
For small files, you can create such an index on demand when you need a random string. To do this efficiently for large files, you need to constantly maintain the index, possibly in a separate file.
If the lines tend to be about the same length, and you do not need perfect βrandomness,β you can also select a random byte offset within the file and scan the nearest line break.
Michael borgwardt
source share