Parsing csv with commas, double quotes and encoding

I am using ruby ​​1.9 to parse the following csv file with MacRoman symbol

# encoding: ISO-8859-1
#csv_parse.csv
Name, main-dialogue
"Marceu", "Give it to him ó he, his wife."

I did the following to analyze this.

require 'csv'
input_string = File.read("../csv_parse.rb").force_encoding("ISO-8859-1").encode("UTF-8")
 #=> "Name, main-dialogue\r\n\"Marceu\", \"Give it to him  \x97 he, his wife.\"\r\n"

data = CSV.parse(input_string, :quote_char => "'", :col_sep => "/\",/")
 #=> [["Name, main-dialogue"], ["\"Marceu", " \"Give it to him  \x97 he, his wife.\""]]

So the problem is is the second data array, consisting of one row, not two rows: ["\"Marceu\"", " \"Give it to him \x97 he, his wife.\""]] I tried with :col_sep => ","(this is the default behavior), but it gave me 3 splits.

header = CSV.parse(input_string, :quote_char => "'")[0].map{|a| a.strip.downcase unless a.nil? }
 #=> ["Name", "main-dialogue"]

I need to parse the header again, since there is no double quote.

The output intends to be displayed again in the browser, so the symbol óshould be displayed as usual, and not like one \x97or the other.

Is there a way to solve the above problems?

+5
source share
2

, MacRoman ; irb:

>> "\x97".force_encoding('MacRoman').encode('UTF-8')

:

=> "ó"

, , , . , :

input_string = File.read("../csv_parse.rb").force_encoding('MacRoman').encode('UTF-8')

CSV, ( :quote_char), ', ', :

data = CSV.parse(input_string, :col_sep => ", ")

data :

[
    ["Name", "main-dialogue"],
    ["Marceu", "Give it to him  ó he, his wife."]
]
+8

, :quote_char :col_sep.

, , .. '"' , , :col_sep ","

, , .

0

All Articles