NLTK - how to find out which packages are installed from python?

I am trying to download some of the packages that I installed with the NLTK installer, but I got:

>>> from nltk.corpus import machado Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name machado 

But in the download manager ( nltk.download() ), the machado package is marked as installed, and I have the nltk_data/corpus/machado .

How can I see inside embedded python what installed shells are?

Also, what package should I install to work with this guide? http://nltk.googlecode.com/svn/trunk/doc/howto/portuguese_en.html

I can not find the nltk.examples module that is referenced in the instructions.

+7
python nlp nltk corpus
source share
2 answers

to try

 import nltk.corpus dir(nltk.corpus) 

at that moment he probably told you something about __LazyModule__... , so dir(nltk.corpus) again.

If this does not work, try running a tab in iPython.

+9
source share

NLTK includes the nltk.corpus package, which contains the definitions of case readers (for example, PlainTextCorpusReader ). This package also includes a large list of predefined access points for enclosures that can be downloaded using nltk.downloader() . These access points (for example, nltk.corpus.brown ) are determined regardless of whether the corresponding case is loaded.

  • To see which access points are defined in NLTK, use dir(nltk.corpus) (after import nltk ).

  • To see which one you have , try the following:

     import os import nltk print( os.listdir( nltk.data.find("corpora") ) ) 

    It just dumps the list with the contents of the nltk_data/corpora folder. You can take it from there.

  • If you installed your own enclosure in the area of nltk_data/corpora , and NLTK does not know about it, you need to start the corresponding reader yourself. For example, if this is text content in corpora/mycorpus and all files end in .txt , you should do it like this:

     import nltk from nltk.corpus import PlaintextCorpusReader mypath = nltk.data.find("corpora/mycorpus") mycorpus = PlaintextCorpusReader(mypath, r".*\.txt$") 

    But in this case, you can place your own enclosure anywhere and point mypath to it directly, instead of asking NLTK to find it.

+3
source share

All Articles