How to generate multiple parsing trees for an ambiguous sentence in NLTK?

Question

How to generate multiple parsing trees for an ambiguous sentence in NLTK?

I have the following code in Python.

sent = [("very","ADJ"),("colourful","ADJ"),("ice","NN"),("cream","NN"),("van","NN")] patterns= r""" NP:{<ADJ>*<NN>+} """ NPChunker=nltk.RegexpParser(patterns) # create chunk parser for s in NPChunker.nbest_parse(sent): print s.draw()

Output:

 (S (NP very/ADJ colourful/ADJ ice/NN cream/NN van/NN))

But the output should be 2 more parsing trees.

 (S (NP very/ADJ colourful/ADJ ice/NN) (NP cream/NN) (NP van/NN)) (S (NP very/ADJ colourful/ADJ ice/NN cream/NN) van/NN)

The problem is that only the first regular expression is taken by RegexpParser. How can I generate all possible parsing trees at once?

+7

python regex nlp nltk

gamma Sep 27 '13 at 18:39

source share

1 answer

Viktor Vojnovski · Answer 1 · 2013-09-28T10:45:56+0000

This is not possible in the RegexpParser class. It inherits the nbest_parse method from the ParserI interface and looks at the source code ( https://github.com/nltk/nltk/blob/master/nltk/parse/api.py ), you can see that it defaults by launching the parsing method base class and returning it as iterable.

As someone tried to explain in Chunking with nltk , chunking classes are not a tool to use for this purpose (yet!), Look at http://nltk.org/book/ch08.html , there are some quick examples that will help you only halfway with what you want to achieve, which requires a lot of pre-processing and smart design.

How to generate multiple parsing trees for an ambiguous sentence in NLTK?

More articles: