Can python urllib2 automatically unzip gzip data from a web page?

Question

Can python urllib2 automatically unzip gzip data from a web page?

I use

data=urllib2.urlopen(url).read()

I want to know:

How to determine what data in gzipped url?
Does urllib2 automatically decompress data if it is gzipped? Will the data always be a string?

+64

python gzip urllib2

mlzboy Oct 16 '10 at 0:45

source share

3 answers

If you are talking about a simple .gz file, no, urllib2 will not decode it, you will get an immutable .gz file as output.

If you are talking about automatic HTTP layer compression using Content-Encoding: gzip or deflate , then this should be deliberately requested by the client using the Accept-Encoding header.

urllib2 does not set this header, so the response it returns will not be compressed. You can safely retrieve a resource without worrying about compression (although since compression is not supported, the request may take longer).

+7

bobince Oct. 16 '10 at 1:28

source share

Your question has been answered, but for a more complete implementation, take a look at the Pilgrim implementation of this , it covers gzip, deflate, secure URL parsing and much, much more, for the widely used RSS parser, but a useful link nonetheless.

+5

RuiDC Aug 09 '11 at 20:05

source share

ars · Accepted Answer · 2010-10-16 01:21

How to find out if the data was in the gzipped url?

This checks if the contents are gzipped and unpacks it:

 from StringIO import StringIO import gzip request = urllib2.Request('http://example.com/') request.add_header('Accept-encoding', 'gzip') response = urllib2.urlopen(request) if response.info().get('Content-Encoding') == 'gzip': buf = StringIO(response.read()) f = gzip.GzipFile(fileobj=buf) data = f.read()

Can urllib2 automatically unpack data if it was gzipped? Will the data always be a string?

No. Urllib2 does not automatically decompress the data because the "Accept-Encoding" header is not set by urllib2, but you use: request.add_header('Accept-Encoding','gzip, deflate')

Can python urllib2 automatically unzip gzip data from a web page?

More articles: