I am trying to make some changes to an html page encoded with charset = iso-8859-1
doc = Nokogiri :: HTML (open (html_file))
puts doc.to_html will ruin all the accents on the page. Therefore, if I save it, it looks broken in the browser too.
I'm still on Rails 3.0.6 ... Any tips on how to fix this problem?
Here is one of the pages suffering from this, for example: http://www.elmundo.es/accesible/elmundo/2012/03/07/solidaridad/1331108705.html
I asked on Github, but I have a feeling that it will be faster. I will update both places if I receive treatment for this problem.
UPDATE 1 March 24, 2012
Thanks for the comments. I was able to partially solve this problem. I believe that this has nothing to do with Nokogiri. As I mentioned in some comment, I just need to open and save the file to break the accents.
Closest to the fix I received is doing the following:
thefile = File.open(html_file, "r") text = thefile.read doc = Nokogiri::HTML(text) ... do any stuff with nokogiri File.open(html_file, 'w') {|f| f.write(doc.to_html) }
The source file comes with iso-8859-1, and save comes in utf-8, which looks fine. Accents on the spot. With the exception of access to the capital letter: -P I get question marks, as in Econom a, should be รญ (i with an accent)
Stepping closer, I think. If someone has a clue to cover with caps, this can almost be done.
Greetings.