Missing first field character in csv

I am working on importing csv script in php. It works great, with the exception of foreign characters at the beginning of the field.

The code looks like this:

if (($handle = fopen($filename, "r")) !== FALSE)
{
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
         $teljing[] = $data;

     fclose($handle);
}

Here is sample data showing my problem

føroyskir stavir, "Kr. 201,50"
óvirkin ting, "Kr. 100,00"

This will lead to the following

array 
(
     [0] => array 
          (
                 [0] => 'føroyskir stavir',
                 [1] => 'Kr. 201,50'
          )
     [1] => array 
          (
                 [0] => 'virkin ting', <--- Should be 'óvirkin ting'
                 [1] => 'Kr. 100,00'
          )
)

I saw this behaivior documented in some comments at php.net and I tried to ini_set('auto_detect_line_endings',TRUE);detect line endings. No success.

Is anyone familiar with this problem?

Edit:

Thanks, AJ, this issue is now resolved.

setlocale(LC_ALL, 'en_US.UTF-8');

There was a solution.

+5
source share
2 answers

From the PHP manual for fgetcsv():

". . LANG , , en_US.UTF-8, ."

+6

PHP.net/fgetcsv:

kent at marketruler dot com04-Feb-2010 11:18 , fgetcsv, , PHP 5.3 , UTF-16. ISO-8859-1 ( latin1) ISO-8859-1, str_getcsv ( ). , , UTF-8.

. str_getcsv PHP < 5.3, . Utf8_decode , Rasmus Andersson, utf16_decode. , , , BOP , . , . , :

<?php
/**
 * Decode UTF-16 encoded strings.
 *
 * Can handle both BOM'ed data and un-BOM'ed data.
 * Assumes Big-Endian byte order if no BOM is available.
 * From: http://php.net/manual/en/function.utf8-decode.php
 *
 * @param   string  $str  UTF-16 encoded data to decode.
 * @return  string  UTF-8 / ISO encoded data.
 * @access  public
 * @version 0.1 / 2005-01-19
 * @author  Rasmus Andersson {@link http://rasmusandersson.se/}
 * @package Groupies
 */
function utf16_decode($str, &$be=null) {
    if (strlen($str) < 2) {
        return $str;
    }
    $c0 = ord($str{0});
    $c1 = ord($str{1});
    $start = 0;
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
        $start = 2;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $start = 2;
        $be = false;
    }
    if ($be === null) {
        $be = true;
    }
    $len = strlen($str);
    $newstr = '';
    for ($i = $start; $i < $len; $i += 2) {
        if ($be) {
            $val = ord($str{$i})   << 4;
            $val += ord($str{$i+1});
        } else {
            $val = ord($str{$i+1}) << 4;
            $val += ord($str{$i});
        }
        $newstr .= ($val == 0x228) ? "\n" : chr($val);
    }
    return $newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

, . , fgetcsv .. , UTF-16 , .

, - .

0

All Articles