How to use pycurl if the requested data is sometimes gzipped, sometimes not?

Question

How to use pycurl if the requested data is sometimes gzipped, sometimes not?

I do this to get some data:

c = pycurl.Curl() c.setopt(pycurl.ENCODING, 'gzip') c.setopt(pycurl.URL, url) c.setopt(pycurl.TIMEOUT, 10) c.setopt(pycurl.FOLLOWLOCATION, True) xml = StringIO() c.setopt(pycurl.WRITEFUNCTION, xml.write ) c.perform() c.close()

My URLs are usually of this type:

 http://host/path/to/resource-foo.xml

I usually go back 302, pointing to:

 http://archive-host/path/to/resource-foo.xml.gz

Given that I installed FOLLOWLOCATION and ENCODING gzip, everything works fine.

The problem is that sometimes I have a URL that does not redirect to a gzipped resource. When this happens, c.perform() throws this error:

 pycurl.error: (61, 'Error while processing content unencoding: invalid block type')

Which tells me that pycurl is trying to destroy a resource that was not gzipped.

Is there a way I can instruct pycurl to figure out the encoding of the response and gunzip or not, if necessary? I played using different values for ENCODING , but still no beans.

The pycurl docs seem to be a bit lacking.: /

THX!

+2

python http gzip libcurl pycurl

billc Apr 16 '09 at 10:11

source share

1 answer

Piskvor · Accepted Answer · 2009-04-16T22:20:21+0000

If the worst happens, you can omit ENCODING 'gzip', set HTTPHEADER to {'Accept-Encoding': 'gzip'}, check the response headers for "Content-Encoding: gzip", and if it is, write down the answer yourself.

How to use pycurl if the requested data is sometimes gzipped, sometimes not?

More articles: