NLTK - Download all nltk data except corpara from the command line without Downloader user interface

Question

NLTK - Download all nltk data except corpara from the command line without Downloader user interface

We can download all nltk data using:

> import nltk > nltk.download('all')

Or specific data using:

 > nltk.download('punkt') > nltk.download('maxent_treebank_pos_tagger')

But I want to download all the data except for the "corpara" files, for example, all chunkers, grammarians, models, stemmers, tags, tokenizers, etc.

Is there a way to do this without the Downloader interface? sort of

 > nltk.download('all-taggers')

+5

python nlp nltk corpus

Ravi Jun 25 '16 at 16:46

source share

1 answer

Ravi · Accepted Answer · 2016-07-30T19:55:27+0000

List all the case IDs and set _status_cache[pkg.id] = 'installed' .

It will set the status value for all packages as 'installed' , and package packages will be skipped when using nltk.download() .

Instead of downloading all cases and models, if you don’t know which case / package you need, use nltk.download('popular') .

 import nltk dwlr = nltk.downloader.Downloader() for pkg in dwlr.corpora(): dwlr._status_cache[pkg.id] = 'installed' dwlr.download('popular')

Download all packages of a specific folder.

 import nltk dwlr = nltk.downloader.Downloader() # chunkers, corpora, grammars, help, misc, # models, sentiment, stemmers, taggers, tokenizers for pkg in dwlr.packages(): if pkg.subdir== 'taggers': dwlr.download(pkg.id)

NLTK - Download all nltk data except corpara from the command line without Downloader user interface

More articles: