1 million lines is not very much. if they are 100 bytes / line, then 100 MB in memory. So do a simple thing and move on
File.readlines("file").sample(100)
If you start to say more than just fit into memory, the next step is to make one pass through the file to record the line positions, and then just pull the samples out of it.
class RandomLine def initialize(fn) @file = File.open(fn,'r') @positions = @file.lines.inject([0]) { |m,l| m << m.last + l.size }.shuffle end def pick @file.seek(@positions.pop) @file.gets end end
source share