GZip Transparent Processing Encoded Content Using WWW :: Mechanization

I use WWW :: Mechanize and process HTTP responses with the title ' Content-Encoding: gzip ' in my code, first checking the response headers and then using IO :: Uncompress :: Gunzip to get uncompressed content.

However, I would like to do this transparently so that the WWW :: Mechanize methods, such as form (), links (), etc., work and analyze uncompressed content. Since WWW :: Mechanize is a subclass of LWP :: UserAgent, I would prefer to use the LWP :: UA :: handlers to do this.

Although I was partially successful (for example, I can print uncompressed content), I cannot do it transparently so that I can call

 $mech->forms(); 

In short: how can I "replace" the contents inside the $ mech object so that from now on all the WWW :: Mechanize methods work as if Content-Encoding never existed?

I would be grateful for your attention and help. Thanks

+4
source share
3 answers
+7
source

It seems to me that you can replace it using the $ res-> content ($ bytes) member.

By the way, I found this stuff by looking at the source LWP :: UserAgent, then HTTP :: Response, then HTTP :: Message .

+3
source

It is built into UserAgent and thus Mechanize. One main warning to save you some hair

-To debug, be sure to check the $@ error after calling decoded_content.

 $html = $r->decoded_content; die $@ if $@ ; 

Better yet, look at the source of HTTP :: Message and make sure all the support packages are there.

In my case, decoded_content returned undef while the content is raw binary, and I went wildly chasing the geese. The UserAgent will set the error flag when decoding fails, but Mechanize will simply ignore it (it does not check or register the incidence as its own error / warning).

In my case $@sez : "Unable to find IO / HTML.pm .. It was rated

After immersing myself in the source, I found that the built-in decoding process is long, thorough and complex, covering almost every scenario and creating a lot of guesswork (Thanks, Gisle!).

if you are paranoid, explicitly set the default header to be used with each request in new ()

  $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' => scalar HTTP::Message::decodable())); 
0
source

All Articles