If you have a trained tagger, you can mark your text first and then use the NE classifier that comes with NLTK.
Marked text should be presented as a list
sentence = 'The UN' tagged_sentence = [('The','DT'), ('UN', 'NNP')]
Then the classifier ne will be called as
nltk.ne_chunk(tagged_sentence)
Returns a tree. Classified words will appear as tree nodes within the main structure. The result will include if it is a MAN, ORGANIZATION or GPE.
To find out the most relevant terms, you must define a measure of "relevance." Usually tf / idf is used , but if you are looking at only one document, the frequency may be sufficient.
Calculating the frequency of each word in a document is easy with NLTK. First you need to load your body, and as soon as you download it and you have a Text object, just call:
relevant_terms_sorted_by_freq = nltk.probability.FreqDist(corpus).keys()
Finally, you can filter out all words in the relevant_terms_sorted_by_freq that are not in the NE word list.
NLTK offers an online version of the full book , which Iβm interested in starting with
source share