I am working on java to find the similarities between the two documents. I prefer to find semantic similarities, but havent made an effort to find it yet. I use the following approach.
- Extracting terms / tokens (I use JAWS with wordnet to remove synonyms, thereby improving affinity)
- create a matrix of terms.
- LSA
- Cosine of similarity
When I looked at several stackoverflow pages, I had quite a few links to the python implementation.
I would like to know if python is the best language for finding text similarities, and also would like to know if I can find semantic similarities between two documents in python
source
share