Select unicode character u2028 in mysql 5.1

I am trying to select the unicdode / u2028 character in MySQL 5.1. MySQL 5.1 supports utf8 and ucs2.

In newer versions of MySQL, I could choose char just use the utf16 or utf32 sort:

SELECT char(0x2028 using utf16); SELECT char(0x00002028 using utf32); 

But MySQL 5.1 does not support utf16 and utf32. How can I choose a Unicode character?

Perhaps a few words about my use case. I have a third-party application that stores data in a mysql database and uses JavaScript for the user interface. The application does not have a problem with the unicode character / u2028 and / u 2029, which are valid JSON, but will break JavaScript code. (For more details see http://timelessrepo.com/json-isnt-a-javascript-subset ). I like to know how much data affects this problem, and maybe use MySQL to replace it.


To demonstrate the problem:

 CREATE TABLE IF NOT EXISTS `test` ( `id` int(11) NOT NULL AUTO_INCREMENT, `string` varchar(100) CHARACTER SET utf8 NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ; INSERT INTO `test` (`id`, `string`) VALUES (1, 'without U+2028'), (2, 'with U+2028 at this " "point'); SELECT * FROM test WHERE string LIKE CONCAT("%", char(0x2028 using utf16), "%"); // returns row 2 as expected SELECT * FROM test WHERE string LIKE CONCAT("%", char(??? using utf8), "%"); // U+2028 in utf8 is 0xE2 0x80 0xA8 isn't it? // But how to parse this to char function? 
+7
mysql unicode utf-8
source share
1 answer

Unicode character U + 2028 can be encoded in UTF-8 as hexadecimal e280a8. So the answer is to use the UNHEX function in MySQL to find it.

 SELECT * FROM test WHERE string LIKE CONCAT("%", UNHEX('e280a8'), "%"); 

MySQL 5.1 can only process characters marked in UTF-8 with a length of up to three bytes. Thus, searching for U + 2028 using UNHEX will work, but searching for U + 1F600 will not be the same as it takes four bytes.

Use UNHEX ('e280a9') to search for U + 2029.

+4
source share

All Articles