Named object recognition using NLTK. Relevance of extracted keywords

I was looking at the Named Entity Recognition function for NLTK. Can I find out which of the extracted keywords is most appropriate for the source text? Also, is it possible to know the type (Person / Organization) of the extracted keywords?

+4
source share
1 answer

If you have a trained tagger, you can mark your text first and then use the NE classifier that comes with NLTK.

Marked text should be presented as a list

sentence = 'The UN' tagged_sentence = [('The','DT'), ('UN', 'NNP')] 

Then the classifier ne will be called as

 nltk.ne_chunk(tagged_sentence) 

Returns a tree. Classified words will appear as tree nodes within the main structure. The result will include if it is a MAN, ORGANIZATION or GPE.

To find out the most relevant terms, you must define a measure of "relevance." Usually tf / idf is used , but if you are looking at only one document, the frequency may be sufficient.

Calculating the frequency of each word in a document is easy with NLTK. First you need to load your body, and as soon as you download it and you have a Text object, just call:

 relevant_terms_sorted_by_freq = nltk.probability.FreqDist(corpus).keys() 

Finally, you can filter out all words in the relevant_terms_sorted_by_freq that are not in the NE word list.

NLTK offers an online version of the full book , which I’m interested in starting with

+7
source

All Articles