How to configure nltk data directory from code?

How to configure nltk data directory from code?

+66
python directory path nlp nltk
Aug 19 '10 at 13:42 on
source share
6 answers

Just change the nltk.data.path elements, this is a simple list.

+61
Oct 11 '10 at 5:55
source share

From the code http://www.nltk.org/_modules/nltk/data.html :

 ``nltk:path``: Specifies the file stored in the NLTK data package at *path*. NLTK will search for these files in the directories specified by ``nltk.data.path``. 

Then in the code:

 ###################################################################### # Search Path ###################################################################### path = [] """A list of directories where the NLTK data package might reside. These directories will be checked in order when looking for a resource in the data package. Note that this allows users to substitute in their own versions of resources, if they have them (eg, in their home directory under ~/nltk_data).""" # User-specified locations: path += [d for d in os.environ.get('NLTK_DATA', str('')).split(os.pathsep) if d] if os.path.expanduser('~/') != '~/': path.append(os.path.expanduser(str('~/nltk_data'))) if sys.platform.startswith('win'): # Common locations on Windows: path += [ str(r'C:\nltk_data'), str(r'D:\nltk_data'), str(r'E:\nltk_data'), os.path.join(sys.prefix, str('nltk_data')), os.path.join(sys.prefix, str('lib'), str('nltk_data')), os.path.join(os.environ.get(str('APPDATA'), str('C:\\')), str('nltk_data')) ] else: # Common locations on UNIX & OS X: path += [ str('/usr/share/nltk_data'), str('/usr/local/share/nltk_data'), str('/usr/lib/nltk_data'), str('/usr/local/lib/nltk_data') ] 

To change the path, simply add a list of possible paths:

 import nltk nltk.data.path.append("/home/yourusername/whateverpath/") 

Or in the windows:

 import nltk nltk.data.path.append("C:\somewhere\farfar\away\path") 
+37
Apr 10 '14 at 11:59
source share

I am using append, example

 nltk.data.path.append('/libs/nltk_data/') 
+22
Apr 10 '14 at 5:17
source share

Instead of adding nltk.data.path.append('your/path/to/nltk_data') to each script, NLTK accepts the NLTK_DATA environment variable. ( link to code )

Open ~/.bashrc (or ~/.profile ) with a text editor (e.g. nano , vim , gedit ) and add the following line:

 export NLTK_DATA="your/path/to/nltk_data" 

Run source to load the environment variable

 source ~/.bashrc 


Test

Open python and execute the following lines

 import nltk nltk.data.path 

You can see your nltk data path already there.

Link: @alvations answer to nltk / nltk # 1997

+4
Apr 02 '18 at 9:23
source share

For users using uwsgi:

I had problems because I wanted the uwsgi application (to work as a user other than me) to have access to the nltk data that I previously downloaded. For me, the following line worked myapp_uwsgi.ini :

 env = NLTK_DATA=/home/myuser/nltk_data/ 

The NLTK_DATA environment variable is set NLTK_DATA , as suggested by @schemacs.
You may need to restart your uwsgi process after making this change.

+1
Jun 20 '17 at 20:46 on
source share

Another solution is to get ahead of it.

try importing nltk nltk.download ()

When a window appears asking if you want to load the case, you can specify in which directory it should be loaded.

0
Jun 05 '19 at 12:15
source share



All Articles