I have the following situation:
String a = "A web crawler is a computer program that automatically scans the Internet World Wide Web"; String b = "The web scanner computer program is browsing the World Wide Web";
Is there any idea or standard algorithm for calculating percent similarity?
For example, in the above case, the manually-estimated similarity should be 90% ++.
My idea is to tokenize both strings and compare the number of agreed tokens. Something like (7 tokens / 1 0 tokens) * 100. But, of course, for this method it is generally ineffective. Comparing the number of matching characters also seems inefficient ....
Can anyone give some recommendations?
Above is part of my project, a plagiarism analyzer.
Therefore, the words matched will be exactly the same without any synonyms.
The only question in this case is how to calculate a fairly accurate percentage of similarity.
Thanks so much for any help.
java similarity
Mr CooL Mar 06 2018-10-06T00: 00Z
source share