ASCII-8BIT is an alias for BINARYopen-uridoes the funny thing: if the file is less than 10 KB (or something like that), it returns String, and if it is larger, then it returns StringIO. This can be confusing if you are trying to solve encoding problems.
If the files are not huge, I would recommend manually loading them into lines:
require 'uri'
require 'net/http'
require 'net/https'
uri = URI.parse url_to_file
http = Net::HTTP.new(uri.host, uri.port)
if uri.scheme == 'https'
http.use_ssl = true
end
body = http.start { |session| session.get uri.request_uri }.body
Then you can use https://rubygems.org/gems/ensure-encoding gem
require 'ensure/encoding'
utf8_body = body.ensure_encoding('UTF-8', :external_encoding => :sniff, :invalid_characters => :transcode)
I was very pleased ensure-encoding... we use it in production at http://data.brighterplanet.com
Please note that you can also say :invalid_characters => :ignoreinstead :transcode.
, -, :external_encoding => 'ISO-8859-1' :sniff