How can I measure the similarity fraction between two string sequences?
I have two text files, and in the files the sequences are written as
First file:
AAA BBB DDD CCC GGG MMM AAA MMM
Second file:
BBB DDD CCC MMM AAA MMM
How to measure the similarity between these two files in terms of line order?
For example, in the above example, both files are similar due to the order of the lines, but some lines are missing from file-2. Which algorithm is best suited to solve this problem, so that I can measure how similar the order of the lines is, and not the frequency of the lines in two?
Dheeraj agarwal
source share