Nltk procedure gives << raise URLError ('unknown url type:% s'% type) >> in python

This is based on my comments in NLTK v3.2: Failed to execute nltk.pos_tag () and is based on the code provided as an answer in Extract city names from text using python


I ran the code from Alvas's answer in NLTK v3.2: nltk.pos_tag () failed and the cracked file worked fine, but when I try to run my nltk it still gives

 raise URLError('unknown url type: %s' % type 

... I also launched the offer of Sarim Hussein

 nltk.download('averaged_perceptron_tagger') 

successful, but no luck. - GeorgeC 2 days ago

try updating nltk, pip install -U nltk - alvas yesterday

just tried it. Another mistake. On the pip command, I get C: \ Python27 \ Scripts> pip install -U nltk The requirement is already updated: nltk in c: \ python27 \ lib \ site-packages

When running pyhton in Idle or Pyscripter I get

 Traceback (most recent call last): File "E:\SBTF\ntlk_test.py", line 19, in <module> tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag tagger = PerceptronTagger() File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__ self.load(AP_MODEL_LOC) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load self.model.weights, self.tagdict, self.classes = load(loc) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load opened_resource = _open(resource_url) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open return urlopen(resource_url) File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 431, in open response = self._open(req, data) File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 454, in _open 'unknown_open', req) File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 409, in _call_chain result = func(*args) File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 1265, in unknown_open raise URLError('unknown url type: %s' % type) URLError: <urlopen error unknown url type: c> 

- GeorgeC 16 hours ago [above differs from what I reported earlier, before restarting the computer]

What OS are you using?

Windows 10

What is your version of Python?

2.7

How did you install python?

installed through ArcGIS 10.4, as well as through the OSGEO4W installer (with QGIS)

or conda? Where do you use Python? From the command line, terminal or in any IDE?

Idle and Pyscripter, also directly from QGIS and ArcGIS.

Do you run it through a server or cloud? Do you use it on your laptop / computer?

An i7 laptop with 16 GB of RAM and about 500 GB + is free.

Or in some schools where there may be a firewall?

No, my own network without a firewall.

Where do you use python script? Did you have a different nltk.py file name in your directory? - alvas 16 hours ago

  After upgrading to NLTK 3.2 did you use the AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) hack? 

- alvas 16 hours ago

Yes. See the code below what I get.

  Sorry for the multiple questions, your short comment isn't enough to >help us debug the problems, please answer each of the questions in 

previous 2 comments, and we will try to find a solution afterwards. In fact, it will also be easier if you ask another question and state all the answers to these questions in the comments, it seems like this is a different problem. - alvas 16 hours agoHow did you install NLTK? You installed via pip

No worries, thanks for your time.

In the python module of ArcGIS I get

 >>> from nltk.tag import PerceptronTagger >>> from nltk.data import find >>> PICKLE = "averaged_perceptron_tagger.pickle" >>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) >>> tagger = PerceptronTagger(load=False) >>> tagger.load(AP_MODEL_LOC) >>> pos_tag = tagger.tag >>> pos_tag('The quick brown fox jumps over the lazy dog'.split()) [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')] >>> def extract_entity_names(t): ... entity_names = [] ... ... if hasattr(t, 'label') and t.label: ... if t.label() == 'NE': ... entity_names.append(' '.join([child[0] for child in t])) ... else: ... for child in t: ... entity_names.extend(extract_entity_names(child)) ... ... return entity_names ... >>> with open('sample.txt', 'r') as f: ... for line in f: ... sentences = nltk.sent_tokenize(line) ... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences] ... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] ... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True) ... ... entities = [] ... for tree in chunked_sentences: ... entities.extend(extract_entity_names(tree)) ... ... print(entities) ... Runtime error Traceback (most recent call last): File "<string>", line 1, in <module> IOError: [Errno 2] No such file or directory: 'sample.txt' >>> import os >>> os.getcwd() 'C:\\Program Files (x86)\\ArcGIS\\Desktop10.4\\bin' >>> os.chdir(r'E:\SBTF') >>> os.getcwd() 'E:\\SBTF' >>> with open('sample.txt', 'r') as f: ... for line in f: ... sentences = nltk.sent_tokenize(line) ... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences] ... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] ... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True) ... ... entities = [] ... for tree in chunked_sentences: ... entities.extend(extract_entity_names(tree)) ... ... print(entities) ... Runtime error Traceback (most recent call last): File "<string>", line 3, in <module> NameError: name 'nltk' is not defined >>> import nltk >>> with open('sample.txt', 'r') as f: ... for line in f: ... sentences = nltk.sent_tokenize(line) ... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences] ... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] ... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True) ... ... entities = [] ... for tree in chunked_sentences: ... entities.extend(extract_entity_names(tree)) ... ... print(entities) ... Runtime error Traceback (most recent call last): File "<string>", line 5, in <module> File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag tagger = PerceptronTagger() File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__ self.load(AP_MODEL_LOC) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load self.model.weights, self.tagdict, self.classes = load(loc) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load opened_resource = _open(resource_url) File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open return urlopen(resource_url) File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 431, in open response = self._open(req, data) File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 454, in _open 'unknown_open', req) File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 409, in _call_chain result = func(*args) File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 1265, in unknown_open raise URLError('unknown url type: %s' % type) URLError: <urlopen error unknown url type: c> 

String.py is in the following enter image description here

+1
source share

All Articles