I quit using the built-in decoding function UTF-8 / UTF-16 (convert to #number; views), I did not find patterns why UTF-8 was not detected, I suspect because the encoded-like sequence is not always exactly located at the same position in the returned string. You can do an additional check.
UTF-8 three-character indicator: $ startutf8 = chr (0xEF) .chr (187) .chr (191); (if you see this ANYWHERE, and not just the first three characters, the string is encoded in UTF-8)
Decode according to the rules of UTF-8; this replaced an earlier version that intercepted bytes by byte: using
function charset_decode_utf_8 ($string) { /* Only do the slow convert if there are 8-bit characters */ /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */ if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string)) return $string; // decode three byte unicode characters $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e", "'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'", $string); // decode two byte unicode characters $string = preg_replace("/([\300-\337])([\200-\277])/e", "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string); return $string; }
Peters v
source share