Support for four byte Chinese Mysql characters

I can not execute this SQL script:

INSERT INTO `mabase`.`new_table` (`idnew_table`, `name`) VALUES ('2', 'π Ό­'); 

mistake:

ERROR 1366: Invalid string value: '\ xF0 \ xA0 \ xBC \ xAD' for column 'name' on row 1 SQL statement: INSERT INTO mabase . new_table ( idnew_table , name ) VALUES ('2', 'π Ό­')

My database and table are in utf8 charset and utf8_general_ci settings. Also I tried: utf8_unicode_ci, utf8mb4_general_ci, bg5_cinese_ci, gbk_cinese_ci.

I tried all this in Workbench MySql on windows.

π Ό­ is a four-byte character. I have problems only with them. Please tell me how can I save a four-byte character in mysql.

+8
mysql utf-8
source share
2 answers

Your desired character, U + 20F2D , is in the β€œUnicode Unified Code Unified Code Unified” Additional Ideographic Plane block and, therefore, was not available in any MySQL Unicode character set prior to version 5.5; since version 5.5, it is available in utf8mb4 , utf16 , utf16le and utf32 character sets.

It is not available in MySQL big5 or gbk character sets.


Why utf8 encoding does not work

As described in Unicode Support :

The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:

  • ucs2 , UCS-2 encoding for Unicode character set using 16 bits per character.

  • utf8 , the UTF-8 encoding of a Unicode character set using one to three bytes per character.

These two character sets support characters from the Basic Multilingual Plane (BMP) Unicode Version 3.0. BMP characters have the following characteristics:

  • Their code values ​​are between 0 and 65535 (or U+0000 .. U+FFFF ).

  • They can be encoded with a fixed 16-bit word, as in ucs2 .

  • They can be encoded with 8, 16 or 24 bits, as in utf8 .

  • They are enough for almost all characters in the main languages.

Characters not supported by the aforementioned character sets include additional characters that lie outside the BMP. Characters outside the BMP are compared as CHANGE CHARACTER and converted to '?' when converting to a Unicode character set.

In MySQL 5.6, Unicode support includes additional characters that require new character sets that have a wider range and therefore take up more space. The following table shows a brief comparison of the features of previous and current Unicode support.

 ╔══════════════════════════════╦═════════════════ ═══════════════════════════╗
 β•‘ Before MySQL 5.5 β•‘ MySQL 5.5 and up β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ All Unicode 3.0 characters β•‘ All Unicode 5.0 and 6.0 characters β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ No supplementary characters β•‘ With supplementary characters β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ ucs2 character set, BMP only β•‘ No change β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ utf8 character set for up to β•‘ No change β•‘
 β•‘ three bytes, BMP only β•‘ β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ β•‘ New utf8mb4 character set for up to four β•‘
 β•‘ β•‘ bytes, BMP or supplemental β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ β•‘ New utf16 character set, BMP or supplemental β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ β•‘ New utf16le character set, BMP or β•‘
 β•‘ β•‘ supplemental (5.6.1 and up) β•‘
 ╠══════════════════════════════╬═════════════════ ═══════════════════════════╣
 β•‘ β•‘ New utf32 character set, BMP or supplemental β•‘
 β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• ═══════════════════════════╝

These changes are compatible with the promotion. If you want to use new character sets, there are potential incompatibility issues for your applications; see Section 10.1.11, β€œUpgrading from Previous to Current Unicode Support” . This section also describes how to convert tables from utf8 to a character set (4 bytes) utf8mb4 and what restrictions may apply.

Why big5 encoding big5 not work

As described in What Problems Should You Know When Using the Big5 Chinese Character Set? :

MySQL supports the Big5 character set, which is common in Hong Kong and Taiwan (Republic of China). MySQL big5 is actually the Microsoft 950 code page, which is very similar to the original big5 character big5 .

  [ deletia ] 

A request has been added to add HKSCS extensions. People who need this extension can find an interesting patch for Bug # 13577.

Why gbk encoding gbk n't work

As described in What CJK Character Sets are Available in MySQL? :

Here we are trying to clarify which characters are legal in gb2312 or gbk , with reference to official documents. Before reporting gb2312 or gbk , check these links.

+8
source share

These 2 teams will support Chinese attributes in your database.

ALTER DATABASE CHARACTER SET 'utf8' COLLATE 'Utf8_unicode_ci'

ALTER TABLE convert to character set DEFAULT COLLATE DEFAULT

Short and simple.

hope this helps

0
source share

All Articles