English grammar for parsing in NLTK

Question

English grammar for parsing in NLTK

Is there a ready-made English grammar that I can simply download and use in NLTK? I searched for parsing examples using NLTK, but it seems to me that I have to manually specify the grammar before parsing the sentence.

Thank you so much!

+56

python nlp nltk grammar

roboren May 24 '11 at 19:17

source share

7 answers

emilmont · Answer 1 · 2013-07-29 22:52

You can look at pyStatParser , a simple statistical python parser that returns NLTK parsing trees. It comes with public tree structures, and it only generates a grammar model the first time you create a Parser object (after about 8 seconds). It uses the CKY algorithm and analyzes medium-length expressions (like below) per second.

>>> from stat_parser import Parser >>> parser = Parser() >>> print parser.parse("How can the net amount of entropy of the universe be massively decreased?") (SBARQ (WHADVP (WRB how)) (SQ (MD can) (NP (NP (DT the) (JJ net) (NN amount)) (PP (IN of) (NP (NP (NNS entropy)) (PP (IN of) (NP (DT the) (NN universe)))))) (VP (VB be) (ADJP (RB massively) (VBN decreased)))) (. ?))

syllogism_ · Answer 2 · 2015-09-08 20:25

My library, spaCy , provides a high-performance parser.

Installation:

 pip install spacy python -m spacy.en.download all

Using:

 from spacy.en import English nlp = English() doc = nlp(u'A whole document.\nNo preprocessing require. Robust to arbitrary formating.') for sent in doc: for token in sent: if token.is_alpha: print token.orth_, token.tag_, token.head.lemma_

Choi et al. (2015) found that spaCy is the fastest dependency parser. It processes over 13,000 sentences per second, in a single thread. WSJ estimates that it is 92.7%, more than 1% more accurate than any CoreNLP model.

user3798928 · Answer 3 · 2014-07-25 21:42

There is a library called Pattern . It is pretty fast and easy to use.

 >>> from pattern.en import parse >>> >>> s = 'The mobile web is more important than mobile apps.' >>> s = parse(s, relations=True, lemmata=True) >>> print s 'The/DT/B-NP/O/NP-SBJ-1/the mobile/JJ/I-NP/O/NP-SBJ-1/mobile' ...

Fred Foo · Answer 4 · 2011-05-24 19:25

nltk_data has several grammars. In the Python interpreter, enter nltk.download() .

blackmamba · Answer 5 · 2012-08-08 06:47

Use MaltParser, there you have pre-prepared English grammar, as well as some other pre-prepared languages. And Maltparser is a dependency analyzer, not some simple Parser from bottom to top or top to bottom.

Just download MaltParser from http://www.maltparser.org/index.html and use NLTK as follows:

 import nltk parser = nltk.parse.malt.MaltParser()

Piyo Hoge · Answer 6 · 2014-11-10 23:02

I tried NLTK, PyStatParser, Pattern. IMHO Pattern is the best English parser presented in this article. Because it supports pip installation and there is a fancy document on the website ( http://www.clips.ua.ac.be/pages/pattern-en ). I could not find a reasonable document for NLTK (and this gave me an inaccurate result for me by default. And I could not find how to configure it). pyStatParser is much slower than described above in my environment. (It took about one minute to initialize and it took a couple of seconds to parse long sentences. Maybe I did not use it correctly).

maverik_akagami · Answer 7 · 2017-10-24 18:14

Have you tried tagging POS in NLTK?

 text = word_tokenize("And now for something completely different") nltk.pos_tag(text)

The answer is:

 [('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),('completely', 'RB'), ('different', 'JJ')]

Here is an example here NLTK_chapter03

English grammar for parsing in NLTK

More articles: