I have a problem with inserting / reading utf8 content from db. All the checks that I do seem to indicate that the contents in my database should be encoded in utf8, but it seems to be encoded in Latin. Data is originally imported from a PHP script from the CLI.
Configuration:
Zend Framework Version: 1.10.5 mysql-server-5.0: 5.0.51a-3ubuntu5.7 php5-mysql: 5.2.4-2ubuntu5.10 apache2: 2.2.8-1ubuntu0.16 libapache2-mod-php5: 5.2.4-2ubuntu5.10
Vertifications:
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'collation%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_bin | | collation_server | utf8_general_ci | +----------------------+-----------------+
-database
created with CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin; CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ; mysql> status; -------------- mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2 Connection id: 7 Current database: mydb Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu) Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /var/run/mysqld/mysqld.sock Uptime: 9 min 45 sec
-sql: before doing my insertions I run
SET names 'utf8';
-php: before doing my insertions, I use utf8_encode () and mb_detect_encoding () , which gives me "UTF-8". After extracting the content from db and before sending it to the user, mb_detect_encoding () also gives "UTF-8"
Verification Check:
the only way for me to display the content correctly is to set the content type to Latin (if I sniff the traffic, I see the content header with ISO-8859-1):
ini_set('default_charset', 'ISO-8859-1');
This test shows that the content comes out as Latin. I do not understand why. Somebody knows?
Thanks.
source share