I am currently developing a program with the ability to compare small text (say, 250 characters) with a set of similar texts (about 1000-2000 texts).
The goal is to analyze if text A is similar to one or more texts in the collection, and if so, the text in the collection should be restored by identifier. Each text will have a unique identifier.
I would like the result to be as follows:
Option 1: Text A corresponds to text B with 90% similarity, text C with 70% similarity, etc.
Option 2: Text Harmonized text D with highest affinity
I read some machine learning at school, but I'm not sure which algorithm is best for this problem, or if I should consider using NLP (not familiar with the object).
Does anyone have a suggestion on which algorithm to use or where can I find scientific literature to solve my problem?
Thanks for any input!
compare machine-learning nlp
RobertH
source share