I have an array of strings, not so many (maybe a few hundred), but often long (a few hundred characters).
These lines are usually nonsense and different from each other. But in the group of these lines, maybe 5 out of 300, there is a great similarity. In fact, this is the same line that distinguishes formatting, punctuation and a few words.
How can I handle this group of strings?
By the way, I write in ruby, but if nothing else, the algorithm in the pseudo-code will be fine.
thanks
string algorithm ruby grouping similarity
luca
source share