You might want to find TFIDF and cosine similarity .
However, there are complex cases. Let's say you have the following three dishes:
- Pork
- Elongated egg
- Egg sandwich
Which of the two are you going to combine?
- Pulled pork and pull out the egg.
- Bottled Egg and Egg Sandwich
Using TFIDF , you can find the most representative words. For example, the word "sandwich" may appear in many dishes, therefore, is not very representative. (Tuna sandwich, egg sandwich, cheese sandwich, etc.). Mixing a tuna sandwich and a cheese sandwich may not be a good idea.
Once you have TFIDF vectors, you can use the cosine similarity (using TFIDF vectors) and perhaps a static threshold, you can decide whether to combine them or not.
Another problem arises: when you agree, what will you name them? (Egg or egg sandwich?)
Update:
@alvas suggests using clustering after affinity / distinguishability values. I think that would be a good idea. First you can create your nxn distance / similarity nxn using cosine similarity with TFIDF vectors. And after you have a distance matrix, you can group them using the clustering algorithm.
source share