PHP Utf8 Decryption Problem

I have the following address bar: Praha 5, Staré Město,

I need to use the utf8_decode () function on this line before I can write it to a PDF file (using domPDF lib).

However, php utf8 decoding function for the specified address bar looks incorrect (or rather incomplete).

The following code:

<?php echo utf8_decode('Praha 5, Staré Město,'); ?> 

Produces the following:

Praha 5, Staré M? sto

Any idea why ě is not decoded?

+7
source share
4 answers

utf8_decode converts a string from UTF-8 encoding to ISO-8859-1, aka "Latin-1".
Latin-1 encoding cannot represent the letter "ě". It is so simple.
"Decoding" is a completely incorrect expression, it does the same as iconv('UTF-8', 'ISO-8859-1', $string) .

See What every programmer absolutely needs to know positively about encodings and character sets for working with text .

+14
source

you don't need it (@Rajeev: this line is automatically detected as utf-8 encoded:

 echo mb_detect_encoding('Praha 5, Staré Město,'); 

will always return UTF-8.).

Would you prefer: https://code.google.com/p/dompdf/wiki/CPDFUnicode

0
source

I quit using the built-in decoding function UTF-8 / UTF-16 (convert to #number; views), I did not find patterns why UTF-8 was not detected, I suspect because the encoded-like sequence is not always exactly located at the same position in the returned string. You can do an additional check.

UTF-8 three-character indicator: $ startutf8 = chr (0xEF) .chr (187) .chr (191); (if you see this ANYWHERE, and not just the first three characters, the string is encoded in UTF-8)

Decode according to the rules of UTF-8; this replaced an earlier version that intercepted bytes by byte: using

 function charset_decode_utf_8 ($string) { /* Only do the slow convert if there are 8-bit characters */ /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */ if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string)) return $string; // decode three byte unicode characters $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e", "'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'", $string); // decode two byte unicode characters $string = preg_replace("/([\300-\337])([\200-\277])/e", "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string); return $string; } 
0
source

The problem is encoding your PHP file, save the file in UTF-8 encoding, then you do not even need to use utf8_decode , if you get this data 'Praha 5, Staré Město,' from the database, it is better to change its encoding to UTF-8

0
source

All Articles