This is an incredibly difficult problem, and it cannot be solved in such a way as to always give adequate results. I would suggest sticking to some simple principles so that the results are at least predictable. I think you need 2 things: some basic morphology mechanism plus a dictionary of synonyms.
Whenever a search query arrives, for each word you
- Look for a literary match
- “Normalize / canonically” a word using the mechanism of morphology, i.e. make it singular, first shape, etc. and look for matches
- Search for synonyms for the word
Then repeat for all combinations of input words, that is, "big cats", "big cat", "huge cats" huge cat, etc.
In fact, you also need to store index data in canonical form (singluar, first form, etc.) along with the literal form.
As for concepts, such as cats, are also animals - it becomes difficult here. This never worked because otherwise Google already returned conceptual matches, but it doesn’t.
mojuba
source share