NLTK - Download all nltk data except corpara from the command line without Downloader user interface

We can download all nltk data using:

> import nltk > nltk.download('all') 

Or specific data using:

 > nltk.download('punkt') > nltk.download('maxent_treebank_pos_tagger') 

But I want to download all the data except for the "corpara" files, for example, all chunkers, grammarians, models, stemmers, tags, tokenizers, etc.

Is there a way to do this without the Downloader interface? sort of

 > nltk.download('all-taggers') 
+5
source share
1 answer

List all the case IDs and set _status_cache[pkg.id] = 'installed' .

It will set the status value for all packages as 'installed' , and package packages will be skipped when using nltk.download() .

Instead of downloading all cases and models, if you don’t know which case / package you need, use nltk.download('popular') .

 import nltk dwlr = nltk.downloader.Downloader() for pkg in dwlr.corpora(): dwlr._status_cache[pkg.id] = 'installed' dwlr.download('popular') 

Download all packages of a specific folder.

 import nltk dwlr = nltk.downloader.Downloader() # chunkers, corpora, grammars, help, misc, # models, sentiment, stemmers, taggers, tokenizers for pkg in dwlr.packages(): if pkg.subdir== 'taggers': dwlr.download(pkg.id) 
+2
source

All Articles