I had a little problem analyzing CSV strings containing German umlauts (-> ä, ö, ü, Ä, Ö, Ü) in PHP.
Assume the following csv input line:
w;x;y;z 48;OSL;Oslo Stock Exchange;B 49;OTB;Österreichische Termin- und Optionenbörse;C 50;VIE;Wiener Börse;D
And the corresponding PHP code used to parse the string and create an array that contains data from csv-String:
public static function parseCSV($csvString) { $rows = str_getcsv($csvString, "\n"); // Remove headers .. $header = array_shift($rows); $cols = str_getcsv($header, ';'); if(!$cols || count($cols)!=4) { return null; } // Parse rows .. $data = array(); foreach($rows as $row) { $cols = str_getcsv($row, ';'); $data[] = array('w'=>$cols[0], 'x'=>$cols[1], 'y'=>$cols[2], 'z'=>$cols[3]); } if(count($data)>0) { return $data; } return null; }
The result of calling the specified function with the given csv string results in:
Array ( [0] => Array ( [w] => 48 [x] => OSL [y] => Oslo Stock Exchange [z] => B ) [1] => Array ( [w] => 49 [x] => OTB [y] => sterreichische Termin- und Optionenbörse [z] => C ) [2] => Array ( [w] => 50 [x] => VIE [y] => Wiener Börse [z] => D ) )
Please note that the second entry is missing. This only happens if umlaut is placed immediately after the column separator character. This also happens if several umlauts are in sequence, that is, ÖÖÖsterreich → sterreich. The csv string is sent using an HTML form, so the content gets a URL encoding. I am using a utf-8 encoded Linux server and the csv line looks correct before parsing.
Any ideas?
php csv diacritics
Javaguru
source share