Rate paragraph content

Question

Rate paragraph content

We are building a database of scientific papers and analyzing abstracts. The goal is to be able to say: "Interest in this topic has grown by 20% compared to last year." I already tried keyword analysis and didn’t really like the result. So now I’m trying to switch to phrases and the closeness of words to each other and to understand that I am above my head. Can someone point me to a better solution to this, or at least give me a good term for google to find out more?

The language used is python, but I don’t think it really affects your answer. Thank you in advance.

+4

string nlp data-mining

chriscauley Nov 08 '10 at 23:30

source share

2 answers

This is just a hunch; not sure if this approach will work. If you look at phrases and the closeness of words, perhaps you can create a Markov chain? Thus, you can get an idea of the frequency of some phrases / words in relation to others (based on the order of your Markov chain).

So, you are building the Markov chain and frequency distribution for 2009. Then you create another one at the end of 2010 and compare the frequencies (of certain phrases and words). You may need to normalize the text.

In addition, something that comes to mind is methods of processing a natural language (there is a lot of literature related to the topic!).

+2

Vivin paliath Nov 08 '10 at 23:49

source share

winwaed · Accepted Answer · 2010-11-08T23:45:32+0000

This is a big question, but a good introduction to NLP like this can be found with the NLTK toolkit. This is for learning and working with Python - i.e. good for guiding and experimenting. There is also a very good open source book (also in paper form from O'Reilly) on the NLTK website.

Rate paragraph content

More articles: