Suppose the file is so large that you cannot afford to put it in RAM. Then you want to use Reservoir Sampling - an algorithm designed to randomly select randomness from lists of unknown arbitrary length that may not fit into memory:
Random r = new Random(); int currentLine = 1; string pick = null; foreach (string line in File.ReadLines(filename)) { if (r.Next(currentLine) == 0) { pick = line; } ++currentLine; } return pick;
At a high level, the tank is sampled according to the basic rule: each subsequent line has a 1 / N chance to replace all previous lines.
This algorithm is a bit unintuitive. At a high level, it works if line N has a 1 / N chance of replacing the currently selected row. Thus, line 1 has a 100% chance of being selected, but the probability of 50% will later be replaced by line 2.
I found the understanding of this algorithm most easily in the form of proof of correctness. So, a simple proof by induction:
1) Base register: for verification, the algorithm works if there is 1 line.
2) If the algorithm works for lines N-1, processing N lines works, because:
3) After processing N-1 iterations of a file of line N, all lines of N-1 are equally likely (probability 1 / (N-1)).
4) The next iteration ensures that line N has a probability of 1 / N (because what the algorithm explicitly assigns to it is the last iteration), reducing the probability of all previous lines:
1/(N-1) * (1-(1/N)) 1/(N-1) * (N/N-(1/N)) 1/(N-1) * (N-1)/N (1*(N-1)) / (N*(N-1)) 1/N
If you know how many lines are in a file in advance, this algorithm is more expensive than necessary, since it always reads the entire file.
Brian source share