How to use pycurl if the requested data is sometimes gzipped, sometimes not?

I do this to get some data:

c = pycurl.Curl() c.setopt(pycurl.ENCODING, 'gzip') c.setopt(pycurl.URL, url) c.setopt(pycurl.TIMEOUT, 10) c.setopt(pycurl.FOLLOWLOCATION, True) xml = StringIO() c.setopt(pycurl.WRITEFUNCTION, xml.write ) c.perform() c.close() 

My URLs are usually of this type:

 http://host/path/to/resource-foo.xml 

I usually go back 302, pointing to:

 http://archive-host/path/to/resource-foo.xml.gz 

Given that I installed FOLLOWLOCATION and ENCODING gzip, everything works fine.

The problem is that sometimes I have a URL that does not redirect to a gzipped resource. When this happens, c.perform() throws this error:

 pycurl.error: (61, 'Error while processing content unencoding: invalid block type') 

Which tells me that pycurl is trying to destroy a resource that was not gzipped.

Is there a way I can instruct pycurl to figure out the encoding of the response and gunzip or not, if necessary? I played using different values ​​for ENCODING , but still no beans.

The pycurl docs seem to be a bit lacking.: /

THX!

+2
source share
1 answer

If the worst happens, you can omit ENCODING 'gzip', set HTTPHEADER to {'Accept-Encoding': 'gzip'}, check the response headers for "Content-Encoding: gzip", and if it is, write down the answer yourself.

+5
source

All Articles