I am using ruby 1.9.2
I am trying to parse a CSV file containing several French words (e.g. spécifié) and put the contents in a MySQL database.
When I read lines from a CSV file,
file_contents = CSV.read("csvfile.csv", col_sep: "$")
Items are returned as strings that are ASCII-8BIT encoded (spécifié becomes sp \ xE9cifi \ xE9) and strings like "spécifié" are then NOT properly stored in my MySQL database.
Yehuda Katz says that ASCII-8BIT is really “binary” data, which means that CSV does not know how to read the corresponding encoding.
So, if I try to force CSV to force the encoding as follows:
file_contents = CSV.read("csvfile.csv", col_sep: "$", encoding: "UTF-8")
I get the following error
ArgumentError: invalid byte sequence in UTF-8:
If I go back to the original ASCII-8BIT encoded strings and consider the line my CSV reads as ASCII-8BIT, it looks like "Non sp \ xE9cifi \ xE9" instead of "Non spécifié".
I cannot convert "Non sp \ xE9cifi \ xE9" to "Non spécifié" by doing this "Non sp\xE9cifi\xE9".encode("UTF-8")
because i get this error:
Encoding::UndefinedConversionError: "\xE9" from ASCII-8BIT to UTF-8 ,
which Katz pointed out because ASCII-8BIT is not really a proper string "encoding".
Questions:
- Can I get a CSV to read my file in the appropriate encoding? If so, how?
- How to convert ASCII-8BIT string to UTF-8 for proper storage in MySQL?
string ruby encoding csv utf-8
user141146 Aug 13 '11 at 1:27 2011-08-13 01:27
source share