Ignore Iconv :: IllegalSequence when using Ruby WWW :: Mechanize

I encountered an Iconv :: IllegalSequence error on some web pages when using mechanize lib. Is there a way to do mechanization, just omit the encoded characters and return the cut page? I know about a linked stream , but I would prefer to discard some characters on the page and then re-implement the encoding guess. TIA

+4
source share
2 answers

The solution is to change line 40 in util.rb from

Iconv.iconv(code, "UTF-8", s).join("") 

to

 Iconv.iconv("#{code}//IGNORE", "UTF-8", s).join("") 

or

 Iconv.conv("#{code}//IGNORE", "UTF-8", s) 
+6
source

A better solution does not change the source of util.rb, but adds something like this to your own code:

 Mechanize::Util.send(:define_method, 'self.encode_to' ) { |*args| encoding = args[0] str = args[1] if NEW_RUBY_ENCODING str.encode(encoding) else Iconv.conv(encoding.to_s + '//IGNORE', "UTF-8", str) end } 
+1
source

All Articles