Mysqldump exports data in a bad character set

Yesterday, I first exported my Mysql database, and I found some very strange characters in the dump, for example:

INSERT INTO `piwik_archive_blob_2013_01` VALUES (15,'Actions_actions_url_6',1,'2013-01-17','2013-01-17',1,'2013-01-20 07:36:53','xuNM0ý/œ#&ÝÕ³\ZõNYpÊÀì#!üw7Hж}°ÀAáZoN*šgµ\'GWª[Yûðe¯57 ÃÁÆ7|Ÿ\'Ü%µDh©-EÛ^ËL±ÕÞtª k@ (,b±ßZ.ÒÃ6b²aiÓÍ)87[ïΜ,æya¥uÒ<|+íª7MNuïÝ¿8ñ%1Ʊ>ÚX'); 

My MySQL server version: 5.1.66-0 + squeeze1 (Debian). This database was created automatically using the Piwik setup script.

Here is what I tried to solve this problem:

# 1 First, I checked the encoding of the database.

 > show table status; 

The 26 tables have a utf8_general_ci collation, which sounded quite normal. I assumed that mysqldump was exported to a different encoding (latin1?), So I tried:

 mysqldump -u user -p**** --all-databases --default-character-set=utf8 | gzip -9 > dump.sql.gz 

Result = I still had the same weird characters.

Note) I later learned that the default character set for mysqldump is utf8, regardless of the default server character set. So --default-character-set=utf8 was useless.

# 2 Then I thought I could solve the problem by updating mysql conf. The initial conf was:

 mysql> show variables like "%character%";show variables like "%collation%"; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 

So, I updated /var/lib/mysql/my.cnf and I added:

 [mysqld] init_connect='SET collation_connection = utf8_general_ci' init_connect='SET NAMES utf8' character-set-server=utf8 collation-server=utf8_general_ci default-character-set=utf8 default-collation=utf8_general_ci [mysqldump] default-character-set=utf8 

Then

 /etc/init.d/mysql restart mysql> show variables like "%character%";show variables like "%collation%"; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | utf8_general_ci | +----------------------+-------------------+ 

Result = same weird characters.

# 3 I changed character_set_database and collation_database :

 mysql> ALTER DATABASE piwik default character SET utf8 collate utf8_general_ci; mysql> show variables like "%character%";show variables like "%collation%"; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ 

Result = same weird characters.

# 4 Now I realized that I had to change the default character set in MySQL (latin1) to utf8 before , creating the database.

Matching utf8_general_ci (# 1) means that the data is stored in utf8. However, is it possible that mysqldump believes that the data is stored in latin1 and encodes the data in utf8? This would mean that in the end the data is double utf8 encoded (sigh). In this case, how can I fix the problem?

Thank you for your help.

ps) I wonder why Piwik does not require changing the default encoding of the database in utf8.

+6
source share
2 answers

According to the table name "piwik_archive_ blob _2013_01", I think the column containing strange characters is of type BLOB.

BLOB columns contain binary data. That is why it contains these strange characters. Expected.

Don't worry about it, I'm sure MysqlDump knows how to dump this data.

Cheers, Eric.

0
source

Perhaps the operating system changes the encoding during export and ignores the default character set parameter.

To make sure that the export does not use the encoding of the operating system, use the parameter result file

Check out this article: http://nathan.rambeck.org/blog/1-preventing-encoding-issues-mysqldump

0
source

All Articles