Run the lines and calculate the lengths of each line. You will get something like:
0: 4 1: 6 2: 10 3: 4 ....
Compare only those lines that are the same length. Work with such an index can be further optimized (for example, not to store everything in a flat array, but in some tree or something else).
By the way, the second idea with the file may be rejected due to performance reasons. It is usually a bad idea to have frequent random I / O with a hard drive: try to store as much as you can in memory.
source share