I have a huge list of names of people that I should look for in a huge text.
Only part of the name can be displayed in the text. And you can make mistakes, mistakenly, or shrink. There are no tokens in the text, so I do not know where in the text the name of the person begins. And I don’t know whether this name will appear or not in the text.
Example:
I have “Barack Hussein Obama” on my list, so I have to check the presence of this name in the following texts:
- ... Candidate Barack Obama was elected President of the United States ... (incomplete)
- ... Candidate Barack Hussein was elected President of the United States ... (incomplete)
- ... Candidate Barack H. O .. was elected President of the United States ... (abbreviated)
- ... Candidate Barack Oban was elected President of the United States ... (with error)
- ... Candidate Barack Ovama was elected President of the United States ... (mistakenly, B is next to V)
- ... Candidate John McCain lost the election ... (no Obama name appears)
It is clear that for him there is no deterministic solution, but ...
What is a good heuristic for such a search?
If you had to, how would you do it?
source share