Special characters will not work in MySQL (UTF-8)

So, I had some problems trying to come from Latin1 encoded databases, tables and columns, and now that everything is finally in UTF-8, I can’t update the row to column. I am trying to replace "e" with e sharp (é). But this gives me the following:

ERROR 1366 (HY000): Invalid string value: '\ x82m ...' for the Name column in row 1

when doing this:

UPDATE access SET Name='ém' WHERE id="2";

All databases give me this when I run the status command (except for the current database part, of course):


 Connection id: 1 Current database: access Current user: root@localhost SSL: Not in use Using delimiter: ; Server version: 5.1.47-community MySQL Community Server (GPL) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 TCP port: 3306 Uptime: 20 min 16 sec Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11 Queries per second avg: 0.90 

And running the chcp command in cmd gives me 850 . Oh, and in some cases, I got this:

ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id = "2"

I searched everywhere for a solution, but I couldn’t find anything anywhere, and since I always had good answers on Stackoverflow, I thought I would ask here.

Thanks for any help!

+4
source share
4 answers

This thread , although somewhat old, seems to lead to the conclusion that cmd.exe and the mysql client do not handle UTF-8 (with the larger fork associated with cmd.exe).

Reading in SQL from a file is recommended, as an alternative client is used - or a UNIX flavor. :)

+3
source

The solution is to set the connection variables to any code page that your installation for Windows uses (and not latin1, like what many pages recommend). The character encoding of cmd.exe is not latin1).

In my case, code page 850:

mysql> SET NAMES cp850;

Here is an example connection with UTF-8:

 mysql> show variables like '%char%'; +--------------------------+---------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | C:\xampp\mysql\share\charsets\ | +--------------------------+---------------------------------+ 8 rows in set (0.00 sec) 

Here's what happens with accented characters:

 mysql> select nom from assignatura where nom like '%prob%'; +---------------------------------------+ | nom | +---------------------------------------+ | Probabilitat i Processos Estocàstics | | Probabilitat i Processos Estocàstics | +---------------------------------------+ 2 rows in set (0.03 sec) 

Pay attention to the extraneous symbol before á . Also the emphasis is the wrong direction, it should be à .

After executing SET NAMES cp850; :

 mysql> show variables like '%char%'; +--------------------------+--------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------+ | character_set_client | cp850 | | character_set_connection | cp850 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | cp850 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | C:\xampp\mysql\share\charsets\ | +--------------------------+--------------------------------+ 8 rows in set (0.00 sec) 

Finally, we get the correct accented character:

 mysql> select nom from assignatura where nom like '%prob%'; +--------------------------------------+ | nom | +--------------------------------------+ | Probabilitat i Processos Estocàstics | | Probabilitat i Processos Estocàstics | +--------------------------------------+ 2 rows in set (0.00 sec) 
+4
source

When you enter material at the command line, the lines will be in any type that the terminal uses. Why the mysql client does not translate this before sending it to db still puzzles me, but it is not. You will probably send latin1 to db.

You can save the SQL update in a text file, make sure this text file is UTF-8, and run something like type myfile.txt | mysql db_name type myfile.txt | mysql db_name

0
source

Well ... 0x82 is e-sharp on codepage 850. It will be 0xE9 in ISO-8859-1, which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window for UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match your local code page and let the mysql library take care of the transcoding for you:

 c:\> mysql --default-character-set=cp850 mysql> \s -------------- mysql Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2 Connection id: 17 Current database: Current user: daveshawley@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.1.34-log Source distribution Protocol version: 10 Connection: localhost via TCP/IP Server characterset: ucs2 Db characterset: ucs2 Client characterset: cp850 Conn. characterset: cp850 TCP port: 3306 Uptime: 19 days 8 hours 37 min 55 sec Threads: 2 Questions: 248 Slow queries: 0 Opens: 71 Flush tables: 1 Open tables: 64 Queries per second avg: 0.0 -------------- 

I know this works for a combination of latin1 in one window and utf8 in another window on my MacBook. I also confirmed that ALTER TABLE ... CONVERT TO CHARACTER SET ucs2 did the right thing.

0
source

Source: https://habr.com/ru/post/1316365/


All Articles