Syntactic similarity / distance between two sentences / line / text using nltk

I have 2 texts below

Text1: John loves an apple

Text2: Mike hates orange

If you check above 2 texts, they are both syntactically similar, but semantically have a different meaning.

I want to find

1) Syntactic distance between two texts

2) The semantic distance between two texts

Is there a way to do this with nltk since I'm new to NLP?

+5
source share
2 answers

Yes, but not limited to nltk. One way to use syntactic distance is the Part Of Speech (POS Tagging) tag, which maps each sentence word to a specific tag: https://en.wikipedia.org/wiki/Part-of-speech_tagging

For example, it displays your suggestions:
Text1: noun noun noun


Text2: noun noun noun

Then you can measure the distance of these two sentences.


And for semantics, you need a semantic network of words and find synonyms for each word in a sentence, then try to find the intersection of the synonyms of words in each sentence

+2
source

For semantics, you can try word2vec. You can safely average the similarity of words in a sentence, or you can come up with your own way to weigh words according to its syntax.

from gensim.models import Word2Vec model = Word2Vec.load(path/to/your/model) model.similarity('apple', 'orange') 
+3
source

All Articles