PHP str_getcsv removes umlauts

I had a little problem analyzing CSV strings containing German umlauts (-> ä, ö, ü, Ä, Ö, Ü) in PHP.

Assume the following csv input line:

w;x;y;z 48;OSL;Oslo Stock Exchange;B 49;OTB;Österreichische Termin- und Optionenbörse;C 50;VIE;Wiener Börse;D 

And the corresponding PHP code used to parse the string and create an array that contains data from csv-String:

 public static function parseCSV($csvString) { $rows = str_getcsv($csvString, "\n"); // Remove headers .. $header = array_shift($rows); $cols = str_getcsv($header, ';'); if(!$cols || count($cols)!=4) { return null; } // Parse rows .. $data = array(); foreach($rows as $row) { $cols = str_getcsv($row, ';'); $data[] = array('w'=>$cols[0], 'x'=>$cols[1], 'y'=>$cols[2], 'z'=>$cols[3]); } if(count($data)>0) { return $data; } return null; } 

The result of calling the specified function with the given csv string results in:

 Array ( [0] => Array ( [w] => 48 [x] => OSL [y] => Oslo Stock Exchange [z] => B ) [1] => Array ( [w] => 49 [x] => OTB [y] => sterreichische Termin- und Optionenbörse [z] => C ) [2] => Array ( [w] => 50 [x] => VIE [y] => Wiener Börse [z] => D ) ) 

Please note that the second entry is missing. This only happens if umlaut is placed immediately after the column separator character. This also happens if several umlauts are in sequence, that is, ÖÖÖsterreich → sterreich. The csv string is sent using an HTML form, so the content gets a URL encoding. I am using a utf-8 encoded Linux server and the csv line looks correct before parsing.

Any ideas?

+7
php csv diacritics
source share
2 answers

Assuming fgetcsv ( http://php.net/manual/en/function.fgetcsv.php ) works similarly to str_getcsv (), and then quotes the man page:

The locale setting with this function is taken into account. If LANG is, for example, en_US.UTF-8, files in one byte encoding are not read correctly by this function.

then you should try setting the locale using setlocale http://php.net/manual/en/function.setlocale.php

if this does not work, try turning on multi-byte overload http://www.php.net/manual/en/mbstring.overload.php

or even better, using a standard frame library, such as the Zend / Symfony library, to pull data from

+6
source share

I had a similar problem with the ï symbol in some data that was received from Microsoft Excel, saved as CSV (yes, with the UTF8 encoding selected in the "web options" section of the "Save As ..." dialog box). And yet, this does not seem to be the same UTF8 encoding as str_getcsv .

Now I run everything through iconv and it works fine - it seems like something similar to the idea of ​​an Excel CSV file:

 iconv -f windows-1252 -t utf8 source.csv > output.csv 
0
source share

All Articles