Mistake:
Warning: simplexml_load_string () [Function.simplexml-load string]: Entity: string 3: parser error: input is not correct UTF-8, specify encoding! Byte: 0xE7 0x61 0x69 0x73
XML from the database (output from view source in FF):
<?xml version="1.0" encoding="UTF-8" ?><audit><audit_detail> <fieldname>role_fra</fieldname> <old_value>Role en français</old_value> <new_value>Role ç en français</new_value> </audit_detail></audit></xml>
If I understand correctly, the error is related to the first one enclosed in the old_value tag. To be precise, the error is related to this byte-based: "çais"?
This is how I load the XML:
$xmlData = simplexml_load_string($ed['updates'][$i]['audit_data']);
I cycle using this:
foreach ($xmlData->audit_detail as $a){
A field in the database has a data text type and is set to utf8_general_ci.
My function for creating audit_detail stubs is:
function ed_audit_node($field, $new, $old){ $old = htmlentities($old, ENT_QUOTES, "UTF-8"); $new = htmlentities($new, ENT_QUOTES, "UTF-8"); $out = <<<EOF <audit_detail> <fieldname>{$field}</fieldname> <old_value>{$old}</old_value> <new_value>{$new}</new_value> </audit_detail> EOF; return $out; }
Insertion into the database is performed as follows:
function ed_audit_insert($ed, $xml){ global $visitor; $sql = <<<EOF INSERT INTO ed.audit (employee_id, audit_date, audit_action, audit_data, user_id) VALUES ( {$ed[emp][employee_id]}, now(), '{$ed[audit_action]}', '{$xml}', {$visitor[user_id]} ); EOF; $req = mysql_query($sql,$ed['db']) or die(db_query_error($sql,mysql_error(),__FUNCTION__)); }
The strangest part is this: (without xml declaration) in a simple PHP file:
$testxml = <<<EOF <audit><audit_detail> <fieldname>role_fra</fieldname> <old_value>Role en français</old_value> <new_value>Role &
$ xmlData = simplexml_load_string ($ testxml);
Can someone help shed some light on this?
Change # 1 . Now I am using the DOM to create an XML document and got rid of this error. The function is here:
$dom = new DomDocument(); $root = $dom->appendChild($dom->createElement('audit')); $xmlCount = 0; if($role_fra != $curr['role']['role_fra']){ $root->appendChild(ed_audit_node($dom, 'role_fra', $role_fra, $curr['role']['role_fra'])); $xmlCount++; } ... function ed_audit_node($dom, $field, $new, $old){ //create audit_detail node $ad = $dom->createElement('audit_detail'); $fn = $dom->createElement('fieldname'); $fn->appendChild($dom->createTextNode($field)); $ad->appendChild($fn); $ov = $dom->createElement('old_value'); $ov->appendChild($dom->createTextNode($old)); $ad->appendChild($ov); $nv = $dom->createElement('new_value'); $nv->appendChild($dom->createTextNode($new)); $ad->appendChild($nv); //append to document return $ad; } if($xmlCount != 0){ ed_audit_insert($ed,$dom->saveXML()); }
However, I think I now have a display problem, as this text "Roééleç sé en franêais" (new_value) displays as:
display problem: 
In my HTML document, I have the following declaration for the content type (unfortunately, I do not have the keys to make changes here):
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> ... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
I tried iconv () to convert to ISO-8859-1, however most of the special characters are removed when the conversion is done. All that remains is "Ro" using this command:
iconv('UTF-8','ISO-8859-1',$node->new_value);
iconv: 
Field in db: utf8_general_ci. However, the encoding of the connection will be that by default.
Not quite sure where to go from here ...
Change # 2 . I tried utf8_decode to see if this helps, but it doesn’t.
utf8_decode($a->new_value);
Output: 
I also noticed that my field in db really contains UTF-8. It's good.