Can python urllib2 automatically unzip gzip data from a web page?

I use

data=urllib2.urlopen(url).read() 

I want to know:

  • How to determine what data in gzipped url?

  • Does urllib2 automatically decompress data if it is gzipped? Will the data always be a string?

+64
python gzip urllib2
Oct 16 '10 at 0:45
source share
3 answers
  • How to find out if the data was in the gzipped url?

This checks if the contents are gzipped and unpacks it:

 from StringIO import StringIO import gzip request = urllib2.Request('http://example.com/') request.add_header('Accept-encoding', 'gzip') response = urllib2.urlopen(request) if response.info().get('Content-Encoding') == 'gzip': buf = StringIO(response.read()) f = gzip.GzipFile(fileobj=buf) data = f.read() 
  1. Can urllib2 automatically unpack data if it was gzipped? Will the data always be a string?

No. Urllib2 does not automatically decompress the data because the "Accept-Encoding" header is not set by urllib2, but you use: request.add_header('Accept-Encoding','gzip, deflate')

+139
Oct 16 '10 at 1:21
source share
— -

If you are talking about a simple .gz file, no, urllib2 will not decode it, you will get an immutable .gz file as output.

If you are talking about automatic HTTP layer compression using Content-Encoding: gzip or deflate , then this should be deliberately requested by the client using the Accept-Encoding header.

urllib2 does not set this header, so the response it returns will not be compressed. You can safely retrieve a resource without worrying about compression (although since compression is not supported, the request may take longer).

+7
Oct. 16 '10 at 1:28
source share

Your question has been answered, but for a more complete implementation, take a look at the Pilgrim implementation of this , it covers gzip, deflate, secure URL parsing and much, much more, for the widely used RSS parser, but a useful link nonetheless.

+5
Aug 09 '11 at 20:05
source share



All Articles