Oh, the joys of character encodings!
Here the following happens. Ruby internally stores the string you retrieved as a sequence of bytes, which is the utf-8 encoding of the file name. When you call URI.escape on it, these bytes are escaped in %xy format, and the resulting string, which now consists solely of bytes in the ASCII range, is used as the URL.
However, the receiving server interprets these bytes (after canceling them from the %xy form) as if they were in a different encoding, in this case ISO -8859-1 , and therefore the resulting file name that it calls does not match anything that he is.
There is a demo using Ruby 1.9, as it supports encodings better.
1.9.3-p194 :003 > f => "ΓΓΓΓ360ΓΓ―ΓΓ΄ΓΓΊΓΓ.txt" 1.9.3-p194 :004 > f.encoding => #<Encoding:UTF-8> 1.9.3-p194 :005 > URI.escape f => "%C3%96%C3%87%C3%84%C3%9C360%C3%93%C3%AF%C3%92%C3%B4%C3%96%C3%BA%C3%80%C3%AD.txt" 1.9.3-p194 :006 > g = f.encode 'iso-8859-1' => "\xD6\xC7\xC4\xDC360\xD3\xEF\xD2\xF4\xD6\xFA\xC0\xED.txt" 1.9.3-p194 :007 > g.encoding => #<Encoding:ISO-8859-1> 1.9.3-p194 :008 > URI.escape g => "%D6%C7%C4%DC360%D3%EF%D2%F4%D6%FA%C0%ED.txt"
So the solution in this case is to encode the string as ISO-8859-1 before slipping away from it. In Ruby 1.9 you do this as stated above, in earlier versions you can use Iconv (Im assuming JRuby includes Iconv, Im not really familiar with JRuby):
1.8.7 :001 > f => "\303\226\303\207\303\204\303\234360\303\223\303\257\303\222\303\264\303\226\303\272\303\200\303\255.txt" 1.8.7 :005 > g = Iconv.conv('iso-8859-1', 'utf-8', f) => "\326\307\304\334360\323\357\322\364\326\372\300\355.txt" 1.8.7 :006 > URI.escape f => "%C3%96%C3%87%C3%84%C3%9C360%C3%93%C3%AF%C3%92%C3%B4%C3%96%C3%BA%C3%80%C3%AD.txt" 1.8.7 :007 > URI.escape g => "%D6%C7%C4%DC360%D3%EF%D2%F4%D6%FA%C0%ED.txt"
Please note that in general, you cannot depend on the server using any particular encoding. It should use utf-8, but obviously it is not.
matt
source share