Comparison of sentences by their meaning

Question

Comparison of sentences by their meaning

Python provides the NLTK library, which is an extensive resource of text and corpus, as well as many text mining and processing techniques. Is there a way to compare sentences based on the value they pass in for possible match? That is, intelligent suggestion assistant?

For example, a sentence like giggling at bad jokes and I like to laugh myself silly at poor jokes . Both convey the same meaning, but sentences do not match remotely (the words are different, Levenstein Distance will fail!).

Now imagine that we have an API that provides functionality such as that found here . Therefore, based on this, we have mechanisms to find out that the words giggle and laugh correspond to the meaning that they convey. Bad will not match poor , so we may need to add additional layers (for example, they match in the context of words like joke , since bad joke usually the same as poor joke , although a bad person not like poor person !).

The main problem is to discard things that do not greatly change the meaning of the sentence. Thus, the algorithm should return the same degree of mathematics between the first sentence and the following: I like to laugh myself silly at poor jokes, even though they are completely senseless, full of crap and serious chances of heart-attack!

So, with the available, is there any algorithm that was conceived? Or do I need to reinvent the wheel?

+7

python nltk data-mining

SexyBeast Feb 13 '13 at 11:11

source share

1 answer

bendaizer · Accepted Answer · 2013-02-14T13:51:04+0000

You will need a more advanced theme modeling algorithm and, of course, some cases to train your model so that you can easily deal with synonyms such as giggling and laughing!

In python, you can try this package: http://radimrehurek.com/gensim/ I have never used it, but it includes classic semantic vector space methods like lsa / lsi, arbitrary projection and even lda.

My personal favorite is random projection because it is faster and still very efficient (I do this in java with a different library, though).

Comparison of sentences by their meaning

More articles: