Handling IncompleteRead, URLError

Question

Handling IncompleteRead, URLError

this is part of a web development script.

def printer(q,missing): while 1: tmpurl=q.get() try: image=urllib2.urlopen(tmpurl).read() except httplib.HTTPException: missing.put(tmpurl) continue wf=open(tmpurl[-35:]+".jpg","wb") wf.write(image) wf.close()

q is Queue() , consisting of Urls, and `missing is an empty queue for collecting urls errors

runs in parallel with 10 threads.

and every time I run it, I get it.

  File "C:\Python27\lib\socket.py", line 351, in read data = self._sock.recv(rbufsize) File "C:\Python27\lib\httplib.py", line 541, in read return self._read_chunked(amt) File "C:\Python27\lib\httplib.py", line 592, in _read_chunked value.append(self._safe_read(amt)) File "C:\Python27\lib\httplib.py", line 649, in _safe_read raise IncompleteRead(''.join(s), amt) IncompleteRead: IncompleteRead(5274 bytes read, 2918 more expected)

but i use except ... i tried something else like

 httplib.IncompleteRead urllib2.URLError

even,

 image=urllib2.urlopen(tmpurl,timeout=999999).read()

but all this does not work.

How can I catch IncompleteRead and URLError ?

+7

python error-handling urllib2 httplib

from __future__ Aug 13 '12 at 6:22

source share

1 answer

Michael leonard · Answer 1 · 2015-10-21T20:34:28+0000

I think the correct answer to this question depends on what you think is a “bug fix URL”.

Ways to catch a few exceptions

If you think that any URL that throws an exception should be added to the missing queue, then you can do:

 try: image=urllib2.urlopen(tmpurl).read() except (httplib.HTTPException, httplib.IncompleteRead, urllib2.URLError): missing.put(tmpurl) continue

This will catch any of these three exceptions and add this url to the missing queue. More simply you could do:

 try: image=urllib2.urlopen(tmpurl).read() except: missing.put(tmpurl) continue

To catch any exception, but this is not considered Pythonic and may hide other possible errors in your code.

If by "error causing URL" you mean any URL that causes an httplib.HTTPException error, but you still want to continue processing if other errors are received, you can do:

 try: image=urllib2.urlopen(tmpurl).read() except httplib.HTTPException: missing.put(tmpurl) continue except (httplib.IncompleteRead, urllib2.URLError): continue

This will add the URL to the missing queue if it raises an httplib.HTTPException , but otherwise it will catch httplib.IncompleteRead and urllib.URLError and your script will fail.

Iterate in turn

Aside, while 1 loops are always a little to me. You should be able to scroll through the contents of the queue using the following template, although you can continue to do this in your own way:

 for tmpurl in iter(q, "STOP"): # rest of your code goes here pass

Safe file handling

Alternatively, if it is absolutely necessary to do otherwise, you should use context managers to open and modify files. Thus, your three lines for working with files will become:

 with open(tmpurl[-35:]+".jpg","wb") as wf: wf.write()

The context manager takes care of closing the file and will do this even if an exception occurs when writing to the file.

Handling IncompleteRead, URLError

Ways to catch a few exceptions

Iterate in turn

Safe file handling

More articles: