It might have been better than comment, but I don't have the required reputation yet.
I have not used ConllCorpusreader before (would you consider uploading the file that you upload to gist and provide a link? It would be much easier to check), but I wrote a blog post that might help with the parsing aspect tree: here .
In particular, you probably want to break down each sentence. Chapter 7 of the NLTK book contains more information about this, but this is an example from my blog:
# This grammar is described in the paper by SN Kim, # T. Baldwin, and M.-Y. Kan. # Evaluating n-gram based evaluation metrics for automatic # keyphrase extraction. # Technical report, University of Melbourne, Melbourne 2010. grammar = r""" NBAR: # Nouns and Adjectives, terminated with Nouns {<NN.*|JJ>*<NN.*>} NP: {<NBAR>} # Above, connected with in/of/etc... {<NBAR><IN><NBAR>} """ chunker = nltk.RegexpParser(grammar) tree = chunker.parse(postoks)
Note. You can also use free context grammar (in chapter 8 ).
Each arbitrary (or syntactic) sentence (or in this example Noun Phrase, according to the grammar above) will be a subtree. To access these subtrees, we can use this function:
def leaves(tree): """Finds NP (nounphrase) leaf nodes of a chunk tree.""" for subtree in tree.subtrees(filter = lambda t: t.node=='NP'): yield subtree.leaves()
Each of the provided objects will be a list of word tag pairs. From here you can find the verb.
Then you can play with the grammar above or the parser. Verbs separate noun phrases (see this diagram in chapter 7 ), so you can simply access the first NP after VBD .
Sorry that the solution is not specific to your problem, but hopefully this is a useful starting point. If you upload the file (s), I will take another picture :)
Alex bowe
source share