How to catch 404 error in urllib.urlretrieve

Background: I use urllib.urlretrieve , unlike any other function in urllib* modules, because of the support for the hook function (see reporthook below) .. which is used to display a text progress bar. This is Python> = 2.6.

 >>> urllib.urlretrieve(url[, filename[, reporthook[, data]]]) 

However, urlretrieve so dumb that it leaves no way to detect the status of an HTTP request (for example: was it 404 or 200?).

 >>> fn, h = urllib.urlretrieve('http://google.com/foo/bar') >>> h.items() [('date', 'Thu, 20 Aug 2009 20:07:40 GMT'), ('expires', '-1'), ('content-type', 'text/html; charset=ISO-8859-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0')] >>> h.status '' >>> 

What is the best way to download a remote hook-enabled HTTP file (to show a progress bar) and decent HTTP error handling?

+25
python url urllib
Aug 20 '09 at 20:14
source share
3 answers

Check urllib.urlretrieve full code:

 def urlretrieve(url, filename=None, reporthook=None, data=None): global _urlopener if not _urlopener: _urlopener = FancyURLopener() return _urlopener.retrieve(url, filename, reporthook, data) 

In other words, you can use urllib.FancyURLopener (this is part of the urllib public API). You can override http_error_default to detect 404s:

 class MyURLopener(urllib.FancyURLopener): def http_error_default(self, url, fp, errcode, errmsg, headers): # handle errors the way you'd like to fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook) 
+27
Aug 20 '09 at 21:11
source share

You should use:

 import urllib2 try: resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/") except urllib2.URLError, e: if not hasattr(e, "code"): raise resp = e print "Gave", resp.code, resp.msg print "=" * 80 print resp.read(80) 

Edit: The rationale here is that if you are not expecting an exceptional condition, this is an exception for it, and you probably did not even think about it, so instead of letting your code continue to work while it was unsuccessful, the behavior by default - quite reasonably - prevents its implementation.

+14
Feb 04 '10 at 20:17
source share

The Retreive method of the Open URL object supports logging and throws an exception of 404.

http://docs.python.org/library/urllib.html#url-opener-objects

+2
Aug 20 '09 at 21:13
source share



All Articles