How to set mechanization page encoding?

I am trying to get an ISO-8859-1 encoded page by clicking on the link, so the code is similar to this:

page_result = page.link_with( :text => 'link_text' ).click

So far I get the result with the wrong encoding, so I see characters like:

'T tulo:' instead of 'Título:'

I tried several approaches, including:

  • Specifying the encoding in the first request using the agent:

    @page_search = @agent.get(
      :url => 'http://www.server.com',
      :headers => { 'Accept-Charset' => 'ISO-8859-1' } )
    
  • Specifying the encoding of the page itself

      page_result.encoding = 'ISO-8859-1'
    

But I have to do something wrong: simple clutter always shows the wrong characters.

Do you know how to specify an encoding?

Thanks in advance,

Added: Executable example:

require 'rubygems'
require 'mechanize'

WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"

@agent = WWW::Mechanize.new

@page = @agent.get(
  :url => 'http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&layout=busquedaisbn&language=es',
  :headers => { 'Accept-Charset' => 'utf-8' } )

puts @page.body
+5
source share
4 answers

Hey, you can just do:

agent.page.encoding = 'utf-8'

Hope this helps!

+10
source

, :

agent = Mechanize.new

page = agent.get('http://example.com')

page.encoding = 'windows-1251'

page.search('p').each do |para|
  puts para.text
end
+4

, : Java, utf-16. , Ruby . , .

: Ruby .

+1

, Mechanize ( Ruby NKF), ) .

, :
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"

, , CODE_DICT Hash :)
.

0

All Articles