Is the "UTF8" data encoded correctly in my database?

I have a PHP application with a MYSQL database that should “contain” UTF8 encoded data. Regarding Unicode characters, my application works correctly from start to finish. If someone sends "Strömgren" to my database (via an HTML form), I see "Strömgren" when I return data, etc.

My database tables are all UTF8, and my html pages and forms are all charset = utf-8.

I recently noticed that in one part of my application my Unicode characters turned out to be double. When I showed that there should be Strömgren, I saw Strömgren - Str \ xc3 \ xb6mgren vs Str \ xc3 \ x83 \ xc2 \ xb6mgren. If I utf8_decode a bad string, it looks correct again.

I guess this is "double coding".

I found that the double-encoded part of the application uses different code to connect to the database, and this code makes this call:

$ db-> set_charset ("utf8")

I intended to do this for ALL of my database connections, but somehow ended up in only one place. Thus, almost all of my application uses connections without the set_charset command, and Strömgren always looks right, and a single piece of code that has set_charset ("utf8") (and which only ever reads from db never writes to it), displayed incorrectly.

, , , UTF8? , Strömgren ( set_charset ( "utf8" )), , latin1 ( - ), , latin1, html- "charset = utf -8" " " Strömgren, , Strömgren. (, , , , , .)

:

-, - , ?

-, , (.. Strömgren Strömgren)?

+4
3

, , HEX. (, MySQL DUMP() Oracle.

, HEX ...

  CREATE TABLE foo 
  ( foo_lat VARCHAR(10) CHARSET latin1
  , foo_utf VARCHAR(10) CHARSET utf8
  );

  INSERT INTO foo (foo_lat, foo_utf) VALUES
  ( UNHEX('6dc3b1c3b6'), UNHEX('6dc3b1c3b6') );

  SELECT foo_lat
       , foo_utf
       , HEX(foo_lat)
       , HEX(foo_utf)
    FROM foo ;

foo_lat    foo_utf  HEX(foo_lat)  HEX(foo_utf)  
---------  -------  ------------  --------------
mñö      mñö      6DC3B1C3B6    6DC3B1C3B6   

.

set_charset , msyqli.

, , .

  $db->character_set_name();

... , .

 SELECT @@session.character_set_client
      , @@session.character_set_connection
      , @@session.character_set_results
      , @@session.character_set_server
      , @@global.character_set_client
      , @@global.character_set_connection
      , @@global.character_set_results
      , @@global.character_set_system

... , "", , "", set_charset.

latin1 , .

UTF-8 , latin1, . utf8, " ".

, , utf8.

: UTF-8, latin, , utf8, .

, ; , mysqldump MySQL . .sql, mysqldump, , , , .)


. . - , , . - , .

, . CREATE TABLE, .

A SHOW CREATE TABLE foo - .

+4

, . , :

SHOW FULL COLUMNS FROM table_name;

SHOW CREATE TABLE table_name;

, UTF-8 :

ALTER TABLE tbl_name
CONVERT TO CHARACTER SET 'UTF-8'

, latin1, UTF-8:

ALTER TABLE table_name CHANGE field field blob;
ALTER TABLE table_name CHANGE field field text charset utf8;
0

Strömgren Strömgren Mojibake.

SELECT HEX(...) FROM ... 53 74 72 C3B6 6D 67 72 65 6E ( ), utf8. C3B6 - utf8 hex ö.

"Double coding" will show 53 74 72 C383 C2B6 6D 67 72 65 6Ewhere C383and C2B6are utf8 hex for Ãand .

See duplicate for a discussion and solution, including how to recover data in a couple ALTER TABLEs.

That is, Jose and Spencer had elements of a complete answer.

0
source

All Articles