SyntaxNet creates a tree for the root verb

I am new to Python and the world of NLP. A recent Google Syntaxnet ad intrigued me. However, I have many problems understanding the documentation around the syntax network and related tools (nltk, etc.).

My goal: given input, such as "Wilber kicked the ball." I would like to extract the root verb (with feet) and the object that he means β€œball”.

I stumbled upon "spacy.io" and this visualization seems to encapsulate what I'm trying to do: the POS tag is a string and load it into some kind of tree structure so that I can start with the root verb and go through the sentence.

I played with the syntax /demo.sh, and, as suggested in this thread , commented on the last couple of lines to get the output of conll.

Then I loaded this input into a python script (it was removed myself, maybe not correctly):

import nltk from nltk.corpus import ConllCorpusReader columntypes = ['ignore', 'words', 'ignore', 'ignore', 'pos'] corp = ConllCorpusReader('/Users/dgourlay/development/nlp','input.conll', columntypes) 

I see that I have access to corp.tagged_words (), but there is no relationship between words. Now i'm stuck! How can I load this enclosure into a tree structure?

Any help is much appreciated!

+7
python nlp syntaxnet
source share
3 answers

It might have been better than comment, but I don't have the required reputation yet.

I have not used ConllCorpusreader before (would you consider uploading the file that you upload to gist and provide a link? It would be much easier to check), but I wrote a blog post that might help with the parsing aspect tree: here .

In particular, you probably want to break down each sentence. Chapter 7 of the NLTK book contains more information about this, but this is an example from my blog:

 # This grammar is described in the paper by SN Kim, # T. Baldwin, and M.-Y. Kan. # Evaluating n-gram based evaluation metrics for automatic # keyphrase extraction. # Technical report, University of Melbourne, Melbourne 2010. grammar = r""" NBAR: # Nouns and Adjectives, terminated with Nouns {<NN.*|JJ>*<NN.*>} NP: {<NBAR>} # Above, connected with in/of/etc... {<NBAR><IN><NBAR>} """ chunker = nltk.RegexpParser(grammar) tree = chunker.parse(postoks) 

Note. You can also use free context grammar (in chapter 8 ).

Each arbitrary (or syntactic) sentence (or in this example Noun Phrase, according to the grammar above) will be a subtree. To access these subtrees, we can use this function:

 def leaves(tree): """Finds NP (nounphrase) leaf nodes of a chunk tree.""" for subtree in tree.subtrees(filter = lambda t: t.node=='NP'): yield subtree.leaves() 

Each of the provided objects will be a list of word tag pairs. From here you can find the verb.

Then you can play with the grammar above or the parser. Verbs separate noun phrases (see this diagram in chapter 7 ), so you can simply access the first NP after VBD .

Sorry that the solution is not specific to your problem, but hopefully this is a useful starting point. If you upload the file (s), I will take another picture :)

+3
source share

What you are trying to do is find the dependency, namely dobj . I'm not familiar with SyntaxNet / Parsey yet to tell you exactly how to extract this dependency from it, but I believe this answer may help you. In short, you can configure Parsey to use ConLL syntax for output, parse it into anything you can easily find, and look for the dependence of ROOT on finding the dependencies of the verb and * obj on finding its objects.

+2
source share

If you have analyzed the raw text in conll format using any parser, you can follow the steps to search for node dependent ones that interest you:

  • construct an adjacency matrix from the output code sentence.
  • find the node you are interested in (the verb in your case) and extract its dependents from the adjacency matrix (indices)
  • for each dependent search, its dependency label is in the eighth column in conll format.

PS: I can provide the code, but it would be better if you could code it yourself.

0
source share

All Articles