Café Verona".encode('UTF-8') puts "O...">

Error using Nokogiri

I have this code:

# encoding: utf-8 require 'nokogiri' s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') puts "Original string: #{s}" @doc = Nokogiri::HTML::DocumentFragment.parse(s) links = @doc.css('a') only_text = 'Café Verona'.encode('UTF-8') puts "Replacement text: #{only_text}" links.first.replace(only_text) puts @doc.to_html 

However, the conclusion is as follows:

 Original string: <a href='/path/to/file'>Café Verona</a> Replacement text: Café Verona Café Verona 

Why @doc text in @doc end up with the wrong encoding?

I tried with and without encode('UTF-8') or using Document instead of DocumentFragment , but this is the same problem.

I am using Nokogiri v1.5.6 with Ruby 1.9.3p194.

+6
source share
2 answers

It seems that if you pass a nokogiri text object, it does the following :)

 links.first.replace Nokogiri::XML::Text.new(only_text, @doc) 
+5
source

I cannot duplicate the problem, but I have two different things:

Instead of using:

 s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') 

Try:

 s = "<a href='/path/to/file'>Café Verona</a>" 

Your string is already encoded in UTF-8 because of your # encoding: utf-8 statement. This is why you put this in a script to tell Ruby that the literal string is in UTF-8. Perhaps you code it twice, although I don’t think that Ruby will be - it should silently ignore the second attempt, because it is already UTF-8.

Another thing I'm wondering about is the conclusion:

 Café Verona 

is an indicator of the incorrect language / character set encoding of your system and your terminal. Attempting to output UTF-8 strings to a system installed on something else may lead to inconsistencies in the terminal and / or browser. Windows systems are usually Win-1252, ISO-8859-1, or something similar, not UTF-8. On my Mac OS system, I have the following environment variables set:

 LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 

" Open iso-8859-1 encoded html with nokogiri messes up accents " may also be useful.

0
source

All Articles