Does anyone know of a reliable method (with mySQL or others) for selecting rows in a database containing Japanese characters? I have many rows in my database, some of which have only alphanumeric characters, some of which have Japanese characters.
Rules when you want to have problems with character sets:
when creating the database, use utf8 encoding:
CREATE DATABASE _test DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Make sure all text fields (varchar and text) use UTF-8:
CREATE TABLE _test.test ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, PRIMARY KEY ('id') ) ENGINE = MyISAM;
When you establish a connection, do this before querying / updating the database:
SET NAMES utf8;
With phpMyAdmin - select UTF-8 at login.
set the encoding of the web page to utf-8 to make sure that all the publication / receipt data will be in UTF-8 (or you will have to convert it painfully ...). PHP code (first line in php file or at least before any output):
header('Content-Type: text/html; charset=UTF-8');
Make sure all your requests are written in UTF8 encoding. When using PHP:
6.1. If PHP supports code in UTF-8, just write your files in UTF-8.
6.2. If php is compiled without UTF-8 support, convert strings to UTF-8 as follows:
$str = mb_convert_encoding($str, 'UTF-8', '<put your file encoding here'); $query = 'SELECT * FROM test WHERE name = "' . $str . '"';
That should do it.
Following NickSoft's helpful answer, I had to set the encoding on the db connection to make it work.
& characterEncoding = UTF-8
Then SET NAMES utf8; seemed redundant
As stated in teneff, just use SELECT .
SELECT
When installing MySQL, use UTF-8 as the encoding. Then, choosing utf8_general_ci , since sorting should do the job.
utf8_general_ci
There are a limited number of Japanese characters. You can search them with
SELECT ... LIKE '%カ%'
Alternatively you can try their hexadecimal notation -
SELECT ...LIKE CONCAT('%',CHAR(0x30ab),'%')
You can find this Japanese subset of UTF-8 http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12448
Suppose you use the UTF-8 character set for fields, queries, results ...
As stated in Frosty, just use SELECT.
See the lowest and highest Japanese characters in Unicode charts at http://www.unicode.org/roadmaps/bmp/ and use REGEXP. It can use several different character areas to get the entire character set in Japanese. While you are using UTF-8 and utf8_general_ci encoding, you should be able to use REGEXP '[a-gk-nt-z]', where ag represents one range of Unicode characters from diagrams, kn represents another range, etc.